Monday, October 24, 2016

Universal tendencies

Let’s say we find two languages displaying a common pattern, or two languages converging towards a common pattern, or even all languages doing the same. How should we explain this? Stephen Anderson (here, and discussed by Haspelmath here) notes that if you are a GGer there are three available options: (i) the nature of the input, (ii) the learning theory and (iii) the cognitive limits of the LAD (be they linguistically specific or domain general). Note that (ii) will include (iii) as a subpart and will have to reflect the properties of (i) but will also include all sorts other features (cognitive control, structure of memory and attention, the number of options the LAD considers at one time etc.). These, as Anderson notes, are the only options available to a GGer for s/he takes G change to reflect the changing distribution of Gs in the heads of a population of speakers. Or, to put this more provocatively: languages don't exist apart from their incarnation in speakers’ minds/brains. And given this, all diachronic “laws” (laws that explain how languages or Gs change over time) must reflect the cognitive, linguistic or computational properties of human minds/brains.

This said, Haspelmath (H) observes (here and here) (correctly in my view) that GGers have long “preferred purely synchronic ways of explaining typological distributions,” and by this he means explanations that allude to properties of the “innate Language Faculty” (see here for discussion). In other words, GGers like to think that typological differences reflect intrinsic properties of FL/UG and that studying patterns of variation will hence shed light on its properties. I have voiced some skepticism concerning this “hence” here. In what follows I would like to comment on H’s remarks on a similar topic. However, before I get into details I should note that we might not be talking about the same thing. Here’s what I mean.

The way I understand it, FL/UG bears on properties of Gs not on properties of their outputs. Hence, when I look at typology I am asking how variation in typologies and historical change might explain changes in Gs. Of course, I use outputs of these Gs to try to discern the properties of the underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar (identical?) outputs from different congeries of G rules, operations and filters. In effect, whereas changing surface patterns do signal some change in the underlying Gs, similarity of surface patterns need not. Moreover, given our current accounts there is (sadly) too many roads to Rome, thus the fact that two Gs generate similar outputs (or have moved towards similar outputs from different Gish starting points) does not imply that they must be doing so in the same way. Maybe they are and maybe not. It really all depends.

Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?

Recall, we need to keep our questions clear. Say that we have identified an actual NUT (i.e. we have compelling evidence that certain kinds of G changes are “preferred”). If we have this and we find another G changing in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it. Well, in part: we have identified the kind of thing it is even if we do not yet know why these types of things exist.  An analogy: I have a pencil in my hand. I open it. The pencil falls. Why? Gravitational attraction. I then find out that the same thing happens when I have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and any other school supply at hand. I conclude that these falls are all instances of the same causal power (i.e. gravity). Have I explained why when I pick up a thumbtack and let it loose and it too falls that it falls because of gravity? Well, up to a point. A small point IMO, but a point nonetheless.  Of course we want to know how Gravity does this, what exactly it does when it does it and even why it does is the way that it does, but classifying phenomena into various explanatory pots is often a vital step in setting up the next step of the investigation (viz. identifying and explaining the properties of the alleged underlying “force”).

This said, I agree that the explanation is pretty lame if left like this. Why did X fall when I dropped it? Because everything falls when you drop it. Satisfied? I hope not.

Sadly, from where I sit, many explanations of typological difference or diachronic change have this flavor. In GG we often identify a parameter that has switched value and (more rarely) some PLD that might have led to the switch. This is devilishly hard to do right and I am not dissing this kind of work. However, it is often very unsatisfying given how easy it is to postulate parameters for any observable difference. Moreover, very few proposals actually do the hard work of sketching the presupposed learning theory that would drive the change or looking at the distribution of PLD that the learning theory would evaluate in making the change. To get beyond the weak explanations noted above, we need more robust accounts of the nature of the learning mechanisms and the data that was input to it (PLD) that led to the change.[2] Absent this, we do have an explanation of a very weak sort.

Would H agree? I think so, but I am not absolutely sure of this. I think that H runs together things that I would keep separate. For example: H considers Anderson’s view that many synchronic features of a G are best seen as remnants of earlier patterns. In other words, what we see in particular Gs might be reflections of “the shaping effects of history” and “not because the nature of the Language Faculty requires it” (H quoting Anderson: p. 2). H rejects this for the following reason: he doesn’t see “how the historical developments can have “shaping effects” if they are “contingent” (p. 2). But why not?  What does the fact that something is contingent have to do with whether it can be systematically causal? 1066 and all that was contingent, yet its effects on “English” Gs has been long lasting. There is no reason to think that contingent events cannot have long lasting shaping effects.

Nor, so far as I can tell, is there reason to think that this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies might not explain “universal tendencies.” Here’s what I mean.

Let’s for the sake of argument assume that there are around 50 different parameters (and this number is surely small). This gives a space of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different languages out there (and I assume, maybe incorrectly, Gs) is on the order of 7,000, at least that’s the number I hear bandied about among typologists. This number is miniscule. It covers .0005% of the possible space. It is not inconceivable that languages in this part of the space have many properties in common purely because they are all in the same part of the space. These common properties would be contingent in a UG sense if we assumed that we only accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these properties. It is even possible that it is hard to get to any other of the G possibilities given that we are in this region.  On this sort of account, there might be many apparent universals that have no deep cognitive grounding and are nonetheless pervasive. Don’t get me wrong, I am not saying these exist, only that we really have no knock down reason for thinking they do not.  And if something like this could be true, then the fact that some property did or didn’t occur in every G could be attributed to the nature of the kind of PLD our part of the G space makes available (or how this kind of PLD interacts with the learning algorithm). This would fit with Anderson’s view: contingent yet systematic and attributable to the properties of the PLD plus learning theory.

I don’t think that H (nor most linguists) would find this possibility compelling. If something is absent from 7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well maybe not. My only claim is that the basis for this confidence is not particularly clear. And thinking through this scenario makes it clear that gaps in the existing language patterns/Gs are (at best) suggestive about FL/UG properties rather than strongly dispositive.  It could be our ambient PLD that is responsible. We need to see the reasoning. Culbertson and Adger provide a nice model for how this might be done (see here).

One last point: what makes PoS arguments powerful is that they are not subject to this kind of sampling skepticism. PoS arguments really do, if successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs abstract away from PLD altogether and so remove this as a causal source of systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of course, the two kinds of investigation can be combined However, it is worth keeping in mind that typological investigations will always suffer from the kind of sampling problem noted above and will thus be less direct probes of FL/UG than will PoS considerations. This suggests, IMO, that it would be very good practice to supplement typologically based conclusions with PoS style arguments.[3] Even better would be explicit learning models, though these will be far more demanding given how hard it likely is to settle on what the PLD is for any historical change.[4]

I found H’s discussion of these matters to be interesting and provocative. I disagree with many things that H says (he really is focused on languages rather than Gs). Nonetheless, his discussion can be translated well enough into my own favored terms to be worth thinking about. Take a look.

[1] I say ‘apparent’ for I know very little of this literature though I am willing to assume H is correct that these exist for the sake of argument.
[2] Which does not mean that we have nice models of what better accounts might look like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William Sakas, Charles Yang, a.o., have provided excellent models of what such explanations would look like.
[3] Again a nice example of this is Culbertson and Adger’s work discussed  here. It develops an artificial G argument (meatier than a simple PoS argument) to more firmly ground a typological conclusion.

[4] Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example, shows.

Tuesday, October 18, 2016

Right sizing ling papers

I have a question: what’s the “natural” size of a publishable linguistics paper? I ask because after indulging in a reading binge of papers I had agreed to look at for various reasons, it seems that 50 is the assumed magic number. And this number, IMO, is too high.  If it really takes 50 pages for you to make your point, then either you are having trouble locating the point that you want to make, or you are trying to make too many of them in a single paper. Why care?

I care about this for two reasons. First I think that the size of the “natural” paper is a fair indicator of the theoretical sophistication of a field. Second, I believe that if the “natural” size is, say, 50 pages, then 50 pages will be the benchmark of a “serious” paper and people will aim to produce 50 page papers even if this means taking a 20 page idea and blowing it up to 50 pages. And we all know where this leads. To bloated papers that make it harder than it should be (and given the explosion of new (and excellent) research, it’s already harder than it used to be) to stay current with the new ideas in the field. Let me expand on these two points just a bit.

There are several kinds of linguistics papers. The ones that I am talking would be classified as in theoretical linguistics, specifically syntax. The aim of such a paper is to make a theoretical point. Data and argument are marshaled in service of making this point. Now, in a field with well-developed theory, this can be usually done economically. Why? Because the theoretical question/point of interest can be crisply stated and identified. Thus, the data and arguments of interest can be efficiently deployed wrt this identified theoretical question/point. The less theoretically firm the discipline the harder it is to do this well and the longer (more pages) it takes to identify the relevant point etc.  This is what I mean by saying that the size of the “natural” paper can be taken as a (rough) indicator of how theoretically successful a field is. In the “real” sciences, only review papers go on for 50 pages. Most are under 10 and many are less than that (it is called “Phys Rev Letters” for a reason). In the “real” sciences, one does not extensively review earlier results. One cites them, takes what is needed and moves on. Put another way, in “real” sciences one builds on earlier results, one does not rehearse them and re-litigate them. They are there to be built on and your contribution is one more brick in a pretty well specified wall of interlocking assumptions, principles and empirical results.

This is less true in theoretical syntax. Most likely it is because practitioners do not agree as widely about the theoretical results in syntax than people in physics agree about the results there. But, I suspect, that there is another reason as well. In many of the real sciences, papers don’t locally aim for truth (of course, every scientific endeavor globally does). Here’s what I mean.

Many theoretical papers are explorations of what you get by combining ideas in a certain way. The point of interest is that some combinations lead to interesting empirical, theoretical or conceptual consequences. The hope is that these consequences are also true (evaluated over a longer run), but the immediate assumption of many papers is that the assumptions are (or look) true enough (or are interesting enough even if recognizably false) to explore even if there are (acknowledged) problems with them. My impression is that this is not the accepted practice in syntax. Here if you start with assumptions that have “problems” (in syntax, usually, (apparent) empirical difficulties) then it is thought illegitimate to use these assumptions or further explore their consequences. And this has two baleful influences in paper writing: it creates an incentive to fudge one’s assumptions and/or creates a requirement to (re)defend them. In either case, we get pressure to bloat.

A detour: I have never really understood why exploring problematic assumptions (PA) is so regularly dismissed.[1] Actually, I do understand. It is a reflex of theoretical syntax’s general anti-theoretical stance. IMO, theory is that activity that explores how assumptions connect to lead to interesting consequences. That’s what theoretical exploration is. If done correctly, it leads to a modicum of explanation.

This activity is different from how theory is often described in the syntax literature. There it is (often) characterized as a way of “capturing” data. On this view, the data are unruly and wild and need to be corralled and tamed. Theory is that instrument used to pen it in. But if your aim is to “capture” the data, then capturing some, while loosing others is not a win. This is why problematic assumptions (PA) are non grata. Empirically leaky PAs are not interesting precisely because they are leaky. Note, then, that the difference between “capturing” and “explaining” is critical. Leaky PAs might be explanatorily rich even if empirically problematic. Explanation and data coverage are two different dimensions of evaluation. The aim, of course, is to get to those accounts that both explain and are empirically justified. The goal of “capture” blurs these two dimensions. It is also, IMO, very counterproductive. Here’s why.

Say that one takes a PA and finds that it leads to a nice result, be it empirical or theoretical or conceptual. Then shouldn’t this be seen as an argument for PA regardless of its other problems? And shouldn’t this also be an argument that the antecedent problems the PA suffers from might possibly be apparent rather than real? All we really can (and should) do as theorists is explore the consequences of sets of assumptions. One hopes that over time the consequences as a whole favor one set over others. Hence, there is nothing methodologically inapposite in assuming some PA if it fits the bill. In fact, it is a virtue theoretically speaking for it allows us to more fully explore that idea and see if we can understand why even if false it seems to be doing useful work.

Let’s now turn to the second more pragmatic point. There has been an explosion of research in syntax. It used to be possible to keep up with, by reading, everything. I don’t believe that this is still possible. However, it would make it easier to stay tuned to the important issues if papers were more succinct. I think I’ve said this on FOL before (though I can’t recall where), but I have often found it to be the case that a short form version of a later published paper (say a NELs or WCCFL version) is more useful than the longer more elaborated descendant.[2] Why? Because the longer version is generally more “careful,” and not always in a good way. By this I mean that there are replies to reviewers that require elaboration but that often obscure the main idea. Not always, but often enough.

So as not to end on too grumpy a note, let me suggest the following template for syntax papers. It answers three questions: What’s the problem? Why is it interesting? How to solve it?

The first section should be short and to the point. A paper that cannot identify a crisp problem is one that should likely be rewritten.

The second section should also be short, but it is important. Not all problems are equally interesting. It’s the job of a paper to indicate why the reader should care. In linguistics this means identifying how the results bear on the structure of FL/UG. What light does your question, if answered, hope to shed on the central question of modern GG, the fine structure of FL.

The last section is the meat, generally. Only tell the reader enough to understand the explanation to the question being offered. For a theory paper, raw data should be offered but the discussion should proceed by discussing the structures that these data imply. GGers truck in grammars, which truck in rules and structures and derivations. A theory paper that is not careful and explicit about these is not written correctly. Many paeprs in very good journals take great care to get the morphological diacritics right in the glosses but often eschew providing explicit derivations and phrase markers that exhibit the purported theoretical point. For GG, God is not in the data points, but in the derivations etc. that these data points are in service of illuminating.

Let me go a bit over the top here. IMO, journals would do well to stop publishing most data, reserving this for available methods addenda available online. The raw data is important, and the exposition should rely on it and make it available but the exposition should advert to it not present it. This is now standard practice in journals like Science and there is no reason why it should not be standard practice in ling journals too. It would immediately cut down the size of most articles by at least a third (try this for a typical NLLT paper for example).

Only after the paper has offered its novelties should one compare what’s been offered to other approaches in the field. I agree that this is suggestion should not be elevated to a hard and fast rule. Sometimes a proposal is usefully advanced by demonstrating the shortcomings in others that it will repair. However, more often than not comparisons of old and new are hard to make without some advanced glimpse of the new. In my experience, comparison is most useful after the fact.

Delaying comparison will also have another positive feature, I believe. A proposal might be interesting even if it does no better than earlier approaches. I suspect that we upfront “problems” with extant hypotheses because it is considered illicit to offer an alternative unless the current favorite is shown to be in some way defective. There is a founder prejudice operative that requires that the reigning champion not be discomfited unless proven to be inferior. But this is false. It is useful to know that there are many routes to a common conclusion (see here for discussion). It is often even useful to have an alternative that does less well.

So, What, Why How with a 15-20 page limit, with the hopes of lowering this to 10-15. If that were to happen I would feel a whole lot guiltier for being so far behind in my reading.

[1] Actually, I do understand. It is a reflex of theoretical syntax’s general anti-theory stance.
[2] This might be showing my age for I think that it is well nigh impossible nowadays to publish a short version of a paper in a NELs or WCCFL proceeding and then an elaborated version in more prestigious journal. If so, take it from me!

Thursday, October 6, 2016

An addendum to the previous post

I want to make two more observations concerning Berlinksi and Uriagereka's (B&U) review of classical case theory.

First, as they emphasize, correctly IMO, what made the Vergnaud theory so interesting as regards explanatory adequacy was that it was not signaled by surface features of DPs in some many languages (e.g. English and Chinese). In other words, it was not WYSIWYG. If it held then it could not be reasonably acquired simply by tracking surface morphology. This is what made it a UG candidate and why it bore on issues of explanatory adequacy. In other words, it was a nice example of PoS thinking: you know it despite no PLD to motivate it, hence it is part of FL/UG. Again, it is the absence of surface reflexes of the principle that made it interesting. As B&U puts it:
Deep down, case is compelling because linguistics has become a part of the Galilean undertaking, a way of explaining what is visible by an appeal to what is not. 
Not being "visible" is the key here.

Second, B&U notes how P&P models were influenced by the work of Monod and Jacob on the operon. Indeed, I would go further: the kind of work that microbiologists were doing were taken to serve as good models of how work on language could proceed and Case theory as Vergnaud envisaged this was a nice example of the thinking. Here's what I mean.

The operon was discovered by research on very simple bacteria and the supposition was made that how it worked there was how it worked everywhere. It's logic extends from bacteria to butterflies, chickens, lions, whales, worms etc. In other words, reasoning based on a very simple organism was taken to illuminate how far different organisms organized their microbiology. And all of this without replicating the work on butterflies, mice, whales etc.  This reasoning as applied to linguistics allows inferences from the intensive study of one language to prima facie apply to all. Indeed, the PoS argument licenses this kind of inference which is why it is such an interesting and powerful form of argument.

Why do I mention this? Because linguists nowadays don't really believe this. Evidence that we don't can be seen in our reactions to critics (like Everett, Evans, Wolfe, Tomasello, etc.). A staple of GG criticism is that it is English centric. The supposition behind this criticism is that one cannot legitimately say anything about FL/UG based on the study of a smattering of languages. To talk about FL/UG responsibly requires studying a broad swath of different languages for only in so doing is one licensed to make universal inferences. We reply to the critics by noting how much current linguistic work is typological and cross linguistic and that so many people are working on so many different  kinds of languages. But why is this our only retort. Why not say that one can gain terrific insight into FL/UG by studying a single language? Why the requirement that any claim be founded on masses of cross linguistic investigation?

Note that this is exactly what Monod and Jacob did not do. Nor do microbiologists do so today. Microbiologists study a handful of model organisms and from these we infer laws of biology. That is deemed ok in biology but not linguistics. Why? Why do linguists presuppose that only the extensive study of a wide number of different languages will allow insight into FL/UG? It's not the history of the field, so far as I can tell.

B&U shows how classical case theory arose. Similar stories can be told for virtually every other non-trivial theory within linguistics. It arose not via the study of lots of languages but by trying to understand simple facts within a small number in some deep way. This is how bounding theory arose, the ECP, binding and more. So why the presupposition (visible in the replies we give to our critics that we do, really really do, study more than just English) that cross linguistic typological investigations are the only sure way to investigate FL/UG?

I think that I know one answer: we don't really know much about FL/UG. In other words, many linguists will reply that our claims are weak. I don't buy this. But if you do then it is not clear why GGs critics upset you with their claims. Is it that they are saying out loud what you believe but don't think should be shared in polite company?

Wednesday, October 5, 2016

A zesty intro to the logic of explanatory adequacy with a nod to JR Vergenaud

David Berlinski and Juan Uriagereka have written a very readable (and amusing) history of abstract case theory (here). It is also very pedagogical for it focuses on how Vergnaud's proposal regarding abstract case enhanced the explanatory power of UG, and it does this by showing how Chomsky-Lasnik filters were a vast improvement over ordered rules with all of their intricacies and how case theory was a big conceptual improvement over filters like *[NP to VP].  It is all very nicely done (and quite funny in a very dry sort of way.

The story ends with the observation that ECM is, well "exceptional" and suggests, coyly, that this raises interesting issues. It does. One of the nicest results of recent minimalist theory, IMO, was Lasnik and Saito's (L&S) regularization of Postal's scope facts wrt ECM subjects in the context of a theory of case that junks government and replaces it with something like the old spec-head configuration. What L&S show is that given such a theory, one that the earliest versions of MP promoted, we would expect a correlation between (abstract) case value and the scope of the case assigned DP.  Postal's data, L&S argued showed exactly that. This was a wonderful paper and one of the first really interesting results of minimalist logic.

As you all know, this result fit ill with the mover to Agree based Probe-Goal conceptions of case licensing (after all, the whole idea of the L&S theory is that the DP had to move to a higher position in order to get case licensed and this movement expanded its scope horizons). Chomsky's more recent ideas concerning labeling might force object movement as well and so reclaim the Postal, though not within the domain of the theory of case.  At any rate, all of this is to indicate that there are further interesting theoretical movements prompted by Vergnaud's original theory even to the present day. And yes, I know that there are some who think that it was entirely on the wrong track but even they should appreciate the Berlinksy-Uriagereka reconstruction.

Form and function; the sources of structure

I just read a fascinating paper and excellent comment thereupon in Nature Neuroscience (thx to Pierre Pica for sending them along) (here and here). The papers make the interesting point that, as I will argue below, illuminate two very different views of what structure is and where it comes from.  The two views have names that should now be familiar to you: Rationalism (R) and Empiricism (E). What is interesting about the two papers discussed below is that they indicate that R and E are contrasting philosophical conceptions that have important empirical consequences for very concrete research. In other words, R and E are philosophical in the best sense, leading to different conceptions with testable empirical (though not Empiricist) consequences. Or, to put this another way, R and E are, broadly speaking, research programs pointing to different conceptions of what structure is and how it arises.[1]

Before getting into this more contentious larger theme, let’s review the basic findings. The main paper is written by the conglomerate of Saygin, Osher, Norton, Youssoufian, Beach, Feather, Gaab, Gabrielli and Kanwisher (Henceforth Saygin et al). The comment is written by Dehaene and Dehaene-Lambertz (DDL). The principle finding is, as the title makes admirably clear, that “connectivity precedes function in the development of the visual word form area.” What’s this mean?

Saygin et al observes that the brain is divided up into different functional regions and that these are “found in approximately the same anatomical location in virtually every normal adult”.[2] The question is how this organization arises: “how does a particular cortical location become earmarked”? (Saygin et al:1250). There are two possibilities: (i) the connectivity follows the function or (ii) the function follows the connectivity. Let’s expand a bit.

(i) is the idea that in virtue of what a region of brain does it wires up with another region of brain because of what it does at roughly the same time. This is roughly the Hebbian idea that regions that fire together wire together (FTWT). So, a region that is sensitive to certain kinds of visual features (e.g. Visual Word From Area (VFWA)) hooks up with an area where “language processing is often found” (DDL:1193) to deliver a system that undergirds reading (coding a dependency between “sounds” and “letters”/”words”). 

(ii) reverses the causal flow. Rather than intrinsic functional properties of the different regions driving their connectivity (via concurrent firing), the extrinsic connectivity patterns of the regions drives their functional differentiation. To coin a phrase: areas that are wired together fire together (WTFT). This is what Saygin et al finds :

This tight relationship between function and connectivity across the cortex suggests a developmental hypothesis: patterns of extrinsic connectivity (or connectivity fingerprints) may arise early in development, instructing subsequent functional devel­opment.

The may is redeemed to a does by following young kids before and after they learn to read. As DDL summarizes it (1193):

To genuinely test the hypothesis that the VWFA owes its specializa­tion to a pre-existing connectivity pattern, it was necessary to measure brain connectivity in children before they learned to read. This is what Saygin et al. now report. They acquired diffusion-weighted images in children around the age of 5 and used them to reconstruct the approximate trajectory of anatomical fiber tracts in their brain. For every voxel in the ven­tral visual cortex, they obtained a signature pro­file of its quantitative connectivity with 81 other brain regions. They then examined whether a machine-learning algorithm could be trained to predict, from this connectivity profile, whether or not a voxel would become selective to written words 3 years later, once the children had become literate. Finally, they tested their algo­rithm on a child whose data had not been used for training. And it worked: prior connectivity predicted subsequent function (my bold, NH). Although many children did not yet have a VWFA at the age of 5, the connections that were already in place could be used to anticipate where the VWFA would appear once they learned to read.

I’ve bolded the conclusion: WTFT and not FTWT. What makes the Saygin et al results particularly interesting is their precision. Saygin et al is able to predict the “precise location of the VWFA” in each kid based on “the connectivity of this region even before the functional specialization for orthography in the VWFA exists” (1254). So voxels that are not sensitive to words and letters before kids learn to read, become so in virtue of prior (non functionally based) connections to language regions.

Some remarks before getting into the philosophical issues.

First, getting to this result requires lots of work, both neuro imaging work and good behavioral work. This paper is a nice model for how the two can be integrated to provide a really big and juicy result.

Second, this appeares in a really fancy journal (Nature Neurosceince) and one can hope that it will help set a standard for good cog-neuro work, work that emphasizes both the cognition and the neuroscience. Saygin et al does a lot of good cog work to show that in non-readers VWFA is not differentially sensitive to letters/words even though it comes to be so sensitive after kids have learned to read.

Third, DDL points out (1192-3) that whatever VWFA is sensitive to it is not simply visual features (i.e. a bias for certain kinds of letter like shapes).  Why not? Because (i) the region is sensitive to letters and not numerals despite letters and numerals being formed using the same basic shapes and (ii) VWFA is located in the same place in blind subjects and non-blind ones so long as the blond ones can read braille or letters converted into “synthetic spatiotemporal sound patterns.” As DDL cooly puts it:

This finding seems to rule out any explanation based on visual features: the so-called ‘visual’ cortex must, in fact, possess abstract properties that make it appropriate to recognize the ‘shapes’ of letters, numbers or other objects regardless of input modality.

So, it appears that what VWFA takes as a “shape” is itself influenced by what the language area would deem shapely. It’s not just two perceptual domains with their independently specifiable features getting in sync, for what even counts as a shape depends on what an area is wired to. VWFA treats a “shape” as letter-like if it tags a “shape” that is languagy.

Ok, finally time for the sermon: at the broadest level E and R differ in their views of where structure comes from and its relation to function.

For Es, function is the causal driver and structure subserves it. Want to understand the properties of language, look at its communicative function. Want to understand animal genomes, look at the evolutionarily successful phenotypic expressions of these genomes. Want to understand brain architecture, look at how regions function in response to external stimuli and apply Hebbian FTWT algorithms.  For Es, structure follows function. Indeed, structure just is a convenient summary of functionally useful episodes. Cognitive structures reflect shaping effects of the environmental inputs of value to the relevant mind. Laws of nature are just summaries of what natural objects “do.” Brain architectures are reflections of how sensory sensitive brain regions wire up when concurrently activated. Structure is a summary of what happens. In short, form follows (useful) function.

Rs beg to differ. Es understand structure as a precondition of function. It doesn’t follow function but precedes it (put Descartes before the functional horse!). Function is causally constrained by form, which is causally prior. For Rs, the laws of nature reflect the underlying structure of an invisible real substrate. Mental organization causally enables various kinds of cognitive activity. Linguistic competence (the structure of FL/UG and the structure of individual Gs) allows for linguistic performances, including communication and language acquisition. Genomic structure channels selection.  In other words, function follows form. The latter is causally prior. Structure  “instructs” (Saygin et al’sterm) subsequent functional development.

For a very long time, the neurosciences have been in an Empiricist grip. Saygin et al provides a strong argument that the E vision has things exactly backwards and that the Hebbian Esih connectionist conception is likely the wrong way of understanding the neural and functional structure of the brain.[3] Brains come with a lot of extrinsic structure and this structure casually determines how it organizes itself functionally. Moreover, at least in the case of the VWFA, Darwinian selection pressures (another kind of functional “cause”) will not explain the underlying connectivity. Why not? Because as DDL notes (1192) alphabets are around 3800 years old and “those times are far too short for Darwinian evolution to have shaped our genome for reading.” That means that Saygin et al’s results will have no “deeper” functional explanations, at least as concerns the VWFA. Nope, it’s functionally inexplicable structure all the very bottom. Connectivity is the causal key. Function follows. Saygin et al speculate that what is right for VWFA will hold for brain organization more generally. Is the speculation correct? Dunno. But being a card carrying R you know where I’d lay my bets.

[1] I develop this theme in an article here.
[2] This seems like a pretty big deal to me and argues against any simple minded view of brain plasticity, I would imagine. Maybe any part the brain can perform any possible computation, but the fact that brains regularly organize themselves in pretty much the same way seems to indicate that this organization is not entirely haphazard and that there is method behind it. So, if it is true that the brain is perfectly plastic (which I really don’t believe) then this suggested that it is not the computational differences responsible for its large scale functional architecture. Saygin et al suggest another causal mechanism.
[3] In this it seems to be reprising the history of immunology which moved from a theory in which the environment instructed the immune system to one in which the structure of the immune system took causal priority. See here for a history.

Sunday, October 2, 2016

More on the irrelevance of Everett's research to UG, and a long critical review of Wolfe's lousy book

Here are two pieces you might find interesting.

First, Chomsky has recently been interviewed by the NYT about the Everett/Wolfe junk we have spent far too much time on (Thx to Chomsky for the quote). Here he is asked about Everett's work.

You have mentioned one paragraph that Wolfe got right in his book...what was in that paragraph? Was it an explanation of your work? Why do you think we're seeing this resurgence of analysis? You must get fairly tired of defending your work?!
It was a paragraph in which he quoted my explanation to him of why his crucial example, the Amazonian language Piraha, is completely irrelevant to his conclusions and claims.  The reason is quite simple.  Whether the alleged facts about the language are correct or not (apparently not), they are about the language, not the faculty of language, while the general principles he thinks are being challenged have to do with the faculty of language, explicitly and unambiguously, including the work he cites, and speakers of this language share the common human faculty of language, of course, as illustrated by their fluency in Portuguese.  So his entire article and book are irrelevant to his claims.  To take an analogy, if some tribe were found in which everyone wears a black patch over one eye, it would have no bearing on the study of binocular vision in the human visual system.  The problems in this work extend far beyond the total irrelevance of his examples to his claims, but I won’t elaborate here.I’ve been defending the legitimacy of this work, extensively and in print, for 60 years.  In earlier years the discussion were with serious philosophers, linguists, cognitive scientists.  I’m sorry to see that the resurgence you mention does not begin to approximate that level, one reason why unlike earlier years, I don’t bother to respond unless asked.
I have italicized the most important point: Note: it does not matter if Everett is right because his claims are irrelevant even if correct. This is the critical point and one that has, sadly, been obscured in most discussions. Not that the point has not been made. It has been. Rather, the point is quickly made and then the falsity of Everett's claims are discussed at length. This leaves the appearance that the empirical issues matter to the big point at hand. How? Because the space dedicated to the arguing for their falsity swamps that dedicated to their irrelevance. People conclude that if it really didn't matter then why spend all that time worrying about whether the claims are true. Chomsky's reply is economical and to the point. My suggestion: if you really want to debate the ins and outs of Piraha do so in a separate very technical paper that makes clear that it has nothing to do with UG.

Here is a second review online of Wolfe's book brought to my attention by Peter Ludlow. It is very amusing. I especially like the last paragraph for it identifies the real scandal in all of this.
I’m not worried about Chomsky, however, no more than I’m worried about Darwin’s position in future histories of science. Chomsky’s position too will be just fine. I do worry about how we will look in those histories, however. Because from where I sit the rampant anti-intellectual responses and the failures to distinguish nonsense from solid science in the attacks on Chomsky’s work look more like harbingers of a new dark age, one that rejects thoughtful scientific probes into human nature and levels charges of a new kind of apostasy– the apostasy of using one’s mind instead of gut instinct. And I suspect that, from the perspective of future intellectual historians, Chomsky’s ability to produce this last great piece of work against the backdrop of our new dark age will make his achievements seem all the more impressive.
The shoddiness of our high brow press is the real scandal, and it is one that deserves much more serious attention.