Monday, May 5, 2014

What has the SMT done for you lately?

A recent post (here) illustrated how the SMT provided a unified framework for various kinds of research into the structure of FL. In particular I reviewed some work showing how certain recent findings concerning online parsing followed were parsers to transparently embed grammars as the SMT would require. The flow of argument in the work described goes from results in syntax to consequences for online measures of incremental parsing complexity. In other words, this is a case where SMT, given some properties of the grammar, makes claims about some property of the interface. Here’s a question: can we reverse the direction of argument? Can we find cases where the SMT does grammatically useful work, in that some property of the interface makes a claim about what grammars must look like?  In other words, where the argument moves from some property of the interfaces to some claims about the right theory of grammar? 

Before offering some illustrations, let me note that the first kind of argument is nothing to sneeze at if you are interested in discovering the structure of FL (and who isn’t interested in this?). Why? Because the kind of evidence that comes from things like the filled gap effect and the plausibility effect are different from the kind of data that acceptability (under an interpretation) judgments provide. And, as every intro philo of science course will tell you, the best support for a theory comes from different kinds of data all pointing to the same conclusion (this is called consilience (a term Whewell invented)). Consequently, finding online data that supports conclusions garnered from acceptability data is interesting even if one is mainly interested in competence theories.

This said, for purely selfish reasons, it would still be nice to have examples of arguments going in the other direction as well: implications for grammatical theory from psycho considerations. I have three concrete(ish) examples to offer as models; one that I have talked about before (here) based on the work by Pietroski, Lidz, Hunter and Halberda (PLHH), but would like to remind you of, one on how to understand binding domains based on work by Dave Kush (here), and one that is entirely self-serving (i.e. based on some work I did on non-obligatory control) (here ch. 6).

Before proceeding, let me emphasize that the examples are meant to be illustrative of the logic of the SMT. I do actually think that the cited arguments are pretty compelling (of course I LOVE the third one). However, my point here is not to defend their truth but to outline their logic and how this relates to SMT reasoning. Being motivated by the SMT does not imply that an account is true. But given minimalist interests, it is an interesting property for an account to have. This out of the way, let’s consider some cases.

First PLHH’s argument concerning the meaning of ‘most.’ The argument is that there is a privileged representational format for the meaning of ‘most.’ It’s meaning is (1c) and not the truth functionally equivalent (1a) or (1b):

            (1) Three possible meanings for ‘most.’
a.     [{x: D (x)}, [x: Y (x)}] iff some some set ss Ì {X: D(x)} and OneToOne [s, {x: Y (x)}]
                        b.   |{x: D (x) & Y (x)}| > {x: D (x) & - Y (x)}|
c.     |{x: D (x) & Y (x)}| > |{ x: D (x)}| - |{x: D(x) & Y (x)}|

Why (1c)? Because that’s the one that speakers use when evaluating the quantities of dot arrays when visually presented. And if one assumes that the products of well-designed grammars (e.g. meanings) are transparently used by the interfaces, i.e. if one assumes that the SMT is true, then the fact that the visual system uses representations like (1c) in preference to those in (1a,b) even when the others could be used is evidence that this is what ‘most’ means. In other words, given the SMT and the fact that (1c) is used (and used very efficiently and quickly (see the experiments)) implies that (1c) is the linguistic meaning of ‘most.’

Consider a second case with similar logic. In his thesis and in recent presentations (here), Dave Kush observes that speakers respect c-command restrictions when parsing sentences that involve quantificational binding. More specifically, in parsing sentences like (2a,b), speakers look for antecedents only within the c-command domain (CCD) of the bound pronoun. While parsing, speakers reliably distinguish cases like (2a), where the antecedent c-commands the bound pronoun, from those like (2b), where it doesn’t.

            (2) a. Kathi didn’t think that any janitor1 liked his job when he1 had to clean up
                  b. Kathi didn’t think that any janitor1 liked his job but he1 had to clean up       

Parsing sensitivity to CCDs is further buttressed by the difference found in the online parsing of Strong vs Weak Crossover (S/WCO) effects. Kush provides evidence that incremental parsing respects the former, which invokes CCDs, but not the latter, which does not.[[1]] As Kush notes, this fits well with earlier work on the binding of reflexives and reciprocals. Kush adds some Hindi data on reciprocals to earlier work by Dillon and Sturt on reflexives to ground this conclusion. Taking these various results together, Kush concludes, very reasonably IMO, that online parsing is sensitive to the c-command relations that bound expressions have wrt to their antecedents.

The conclusion, then, is that incremental parsing computes CCDs in real time. Based on this established fact, Kush then asks a second very interesting follow up question: how is this condition implemented in human parsers. He notes the following problem. Human memory architecture appears to be content addressable. He notes that this makes coding CCDs with such an architecture difficult.[[2]] However, the data clearly indicate that we code something like CCDs and do so online quickly. So how is this done? Kush suggests that we do not actually code for CCDs but for something that does similar work, something very like clausemates, the restriction that did the heavy lifting in previous incarnations of syntactic theory. Howard Lasnik and Ben Bruening have recently argued for a return to something like this (Bruening has proposed “phase-command” rather than c-command as the operative condition). Interestingly, as Kush shows, these alternatives to c-command can be made to comfortably fit with the kinds of content addressable memory architectures humans seems we have. Conclusion: our competence grammars use something like clause/phase command conditions rather than CCDs as the relevant primitive relations relevant to binding. Note that the direction of argument goes from online parsing facts plus facts about human memory architecture to claims about the primitive relations in the competence grammar. What’s of interest here is how the SMT is critical in licensing the argument form. Whether Kush is right or not about the conclusion he draws is, of course, important. But IMO, this mere factual issue it is not nearly as interesting as the argument form itself.

Let me end with a third example, one from some of my own work. As some of you may know, I have done some work on Control phenomena. With many colleagues (thx Jairo, Cedric, Masha, Alex), I have argued that there exists a theory of control, the Movement Theory of Control (MTC), that has pretty good empirical coverage and can effectively be derived given certain central tenets of the Minimalist Program (MP). In particular, once one eliminates D-structure in toto and treats Move as a species of Merge then the MTC is all but inevitable. None of this means to say that the MTC is empirically correct, but it does mean that it is a deeply minimalist theory. I would go further (and indeed I have) and argue that the MTC is the only deeply minimalist theory of control and if something like it is incorrect then either MP is wrong (at least for this area of grammar) or control phenomena are not part of FL (here’s a good place to wave hands about properties of the interface). Why do I mention this? Because the MTC is a theory of obligatory control (OC) and, as we all know, this is not the end of the control menagerie. There is non-obligatory control (NOC) as well. What does the MTC have to say about this?

Well, not that much actually.[[3]] Here’s what MTCers have said: it’s the by-product of having a pro in a subject position rather than a PRO (viz. an “A-trace”). And this proposal creates problems for the MTC. How?

Well, to get the data to fall out right any theory of control must assume that given a choice between an OC and an NOC configuration, grammars prefer OC.[[4]] In the context of the MTC this translates into saying that grammars prefer OC style movement to pro binding. Let’s call this a preference for Move over Bind. This sets up the problem. Here it is.

The MTC explains cases like (3a) on the assumption that the gap in the lowest clause is a product of movement (i.e. an “A-trace”). But what prevents a representation like (3b) with a pro in place of the trace thereby licensing the indicated unavailable interpretation? Nothing, and this is a problem.

            (3) a. John1 expects Mary2 to regret PRO2/*1 shaving himself1
                  b. John1 expects Mary2 to regret pro1 shaving himself1

The SMT provides a possible solution (this is elaborated in detail here ch. 6). Given the SMT, parsers respect the distinctions grammars make. Thus, parsers must also prefer treating ecs as A-traces rather than pros if they can. So, in parsing a sentence like (4) the parser prefers treating the ec as an A-trace/Copy rather than a pro. But if so, this A-trace must find a (very) local antecedent. Mary fits the bill, John cannot (it would violate minimality).

            (4) John expects Mary to regret ec shaving himself

Given this line of reasoning, (3b) above is not a possible parse of the indicated sentence and so the sentence is judged unacceptable. Note that this account relies on the SMT: the parser must cleave to the contours the grammar lays out. Thus, given the grammatical preference for Move over Bind we cannot parse the ec in (4) as a pro and so the structure in (3b) is in principle unavailable to the parser.

Note that this logic only applies to phonetically null pronouns. The parser need not decide on the status of a phonetically overt pronoun, hence the acceptability of (5) with the same binding relations we were considering in (3b):

            (5) John1 expects Mary to regret him1 shaving himself1

I don’t expect anyone to believe this analysis (well, not as stated here. Once you read the details you will no doubt be persuaded). Indeed, I have had a hard time convincing very many that the MTC is on the right track at all. But, for the nonce I just want to note that the logic deployed above illustrates another use of SMT reasoning. Let’s review.

Given the SMT, there are strong ties between what parsers do and what grammars prescribe. One can argue from the properties of one to those of the other given the transparency assumptions characteristic of the SMT. In this case, we can use it to argue that though there is nothing grammatically wrong with (3b) it is, given the MTC and the grammatical preference for Move over Bind, inherently unparsable, hence unacceptable under the indicated interpretation.

I have reviewed three instances of SMT reasoning where claims about processing have implications for the competence theory. We have already reviewed arguments that move from competence grammars to interface properties. As is evident (I hope), the SMT has interesting implications for the relationship between the properties of performance systems and competence theories. This should not come as a surprise. We have every reason to think that there is an intimate connection between the shapes of data structures and the algorithms that use them efficiently (see Marr or Gallistel and King on this topic). The SMT operationalizes this truism in the domain of language. Of course, whether the SMT is true is an entirely different issue. Maybe it is, maybe it isn’t. Maybe FL’s data structures are well designed, maybe not. However, for the first time in my linguistic life, I am starting to see how the SMT might function in providing interesting arguments to probe the structure of FL. It seems that at least one version of the SMT has empirical clout and licenses interesting inferences about the structure of performance systems given the properties of competence systems AND vice versa, the structure of competence systems given the properties of performance systems. This version of the SMT further provides a way of understanding minimalist claims about the computational efficiency of grammatical formalisms that make computational sense.[[5]]  Of course, this may all be wrong (though not wrong-headed), but it is novel (at least to me) and very very exciting.

Last point and I sign off: The above outlines one version of the SMT. There are various interpretations around. I have no idea whether this version is exactly what Chomsky has been proposing (I suspect that it is in the same intellectual region, but I don’t really care if my exegesis is correct). I like this interpretation for I can make sense of it and because it has a pedigree within Generative Grammar (which I will discuss in a proximate future post). Some don’t seem to like it because it does not make the SMT obviously false or incoherent (you know who you are). The above version of the SMT relies on treating it as an empirical thesis (albeit a very abstract one) about good design. Good design is indexed by various empirical properties: fast parsing and easy learning being conspicuous examples. Whether the FL design is “optimal” (whatever that might mean) is less interesting to me than the question of how/whether our very efficient linguistic performance systems are as good as they are because they exploit FL’s basic properties. It seems to be a fact that we are very good at language processing/production and acquisition. Some might think that how we do these things so well (and why) calls for an explanation. One explanation, the one that I have urged that we investigate (partly because I believe we have some non-trivial evidence bearing on the questions already), is that part of what makes us good at what we do is that the data-structures that Gs generate (our linguistic knowledge) have the properties they have. In short: why are we fast parsers and good acquirers? Because grammars embody principles like c-command, subjacency, Extension etc. That folks is the SMT! And that folks is a very interesting conjecture that we are just now beginning to study in a non-trivial manner. And that folks is called progress. Yessss!

[1] This raises the interesting question of how to model WCO if one accepts the SMT. Why don’t we find WCO effects in online measures like the ones that pop out for SCO?
[2] Actually the argument is more involved: if we model this feature about human memory in something like an Act-R framework (this is based on implementations of this idea by Rick Lewis) then coding c-command into the system proves to be very difficult.
[3] Well, it does say that NOC must occur where movement is prohibited, say into islands:
(i)             John said that [ ec kissing Mary] would upset Bill
Thus, the ec in (i) is not the product of movement, so not a case of OC, and thus must be a case of NOC.
[4] Were both equally optional NOC would render OC effects invisible as the latter’s observable properties are a proper subset of the former’s.
[5] I will go into this in more detail in a post that I am working on.


  1. I don't know if this counts exactly as what you're looking for with the question "What has the SMT done for you lately?", but I work in an environment where attitude of most of my colleagues range from mostly indifferent to SMT-like ideas to incomprehension and hostility, and as I once mentioned to you in private, I feel a "minimalist" attitude has made me feel like I'm playing with a larger hand, so to speak.

    I've recently had the task of coming up with a mapping between syntax and semantics that can be used to (robustly) represent certain aspects of incremental parsing, and what I've come up with (and has been adopted by some of my colleagues) is heavily inspired, if not really identical to, the work on neo-Davidsonian semantics by Tim Hunter.

    A lot of work in incremental parsing is inspired by a heavily functionalist theory of processing (uniform information density) that has its own sense of minimality, as well as a good body of empirical evidence. It's promoted by people, a few of whom a vehemently hostile to SMT-like things. But I'm increasingly convinced that, if you *need* a formalism, anything that can plausibly represent the kinds of ambiguities and revised expectations that UID-style theories represent in a flexible manner is going to have a lot of the merge-y and move-y scaffolding, a similar notion of features, and so on...and a very constrained design that doesn't require a lot of unpacking and repacking.

  2. I have slightly lost track of what the SMT is meant to be at this point. Here are two "empirical claims"

    A: The language faculty is an optimal solution to legibility conditions at the interfaces.

    B: Languages are the way they are so that they can be processed (parsed, produced and acquired) reasonably efficiently.

    So there is the boring but necessary terminological question about which of these is the SMT but let's skip that and go to the interesting scientific question: which of them is true?

    So I think A is clearly false and B is very likely true. And maybe (mirabile dictu) we agree on this. If we do then we can argue about how this relates to transparency in the parser etc. but we need to get that straight first.

    But maybe (based on something N said earlier) we need to split B into B! and B2

    B1: Languages have some properties that mean they can be processed efficiently.

    B2: They have those properties so they can be processed efficiently.

    So I buy into both of these, but I can see that one could accept B1 and not B2.

    1. I buy B1. I do not currently buy B2: so I don't think that Gs have the properties they have SO THAT they could be efficiently parsed, but they have these properties and as a result they can be so parsed. If you also buy B1 then that's great. The empirical issue then becomes to see which properties language needs so that this is true (indeed, we need to see IF this is true, but we agree on that). I believe that some of these are Extension and cyclicity/Bounding and maybe minimality. A system with these properties will run well when plugged into parsers/acquirers. But, for now let's celebrate that we agree on how to put the issue. That's great.

      As for A: as I've noted, I am not sure what this says. THus I am not sure that it is false. But I do agree it has been very hard to work with, and the B1 claim is interesting enough for me.

    2. So let's call B or maybe just B1 the Weak Efficiency Thesis (WET) because it is pretty uncontroversial.

      One way of exploring this would be to say, precisely, what we mean by reasonably efficient, and then only consider theories of grammar that are efficient in that sense: efficient in the sense that they can be parsed and learned in some technically efficient way. That is roughly my research program. So there are some explicit ideas about some properties that lead to efficient recognition and learning, and we have some theorems that show this. So I understand that you don't like the particular technical notion of efficiency that is used, but let's agree to disagree on that.

      But I don't see how the constraints you mention will work with the WET:
      " I believe that some of these are Extension and cyclicity/Bounding and maybe minimality. A system with these properties will run well when plugged into parsers/acquirers."

      Why do you think that systems like this will run well? What evidence do you have?

    3. That's the topic of my next post. You won't like it, but it will be put out there in the open. BTW, I am happy to agree to disagree. I will address this as well in my next post. Right now, back to racing theses and grading papers.

  3. I'm confused by the discussion of (3) and (4). What's the difference between data that establishes that "grammars prefer OC style movement over binding", and the fact illustrated in (3)? Why does one pertain to the grammar and the other to the parser?

    1. MTC assumes that 'pro' can be freely generated in the spec of a non-finite TP in English. This pro is what's responsible for non OC readings (does not require a c-commanding antecedent, need not be local etc). So the problem with (3b) being unavailable in English cannot be that it is an ungrammatical structure. It isn't. So why is it out? Well, the idea was that to get the OC/NOC distinction at all given the MTC one needs a kind of economy story with movement being preferred to binding where the two are options. If this is a principle of the grammar then transparency says (SMT) that it is a principle of the parser, ie.. when an ec is encountered in a parse assume it is a "trace" rather than a null pronoun, all things being equal. So, the parser will treat the ec in (4) as a trace and look for a local antecedent. THus, the perfectly acceptable structure with pro there will never be available due to how the parser transparently reflects properties of the grammar (i.e. the economy condition). So the sentence were it parable would be ok but it is not parable and not so for a principled reason given the SMT. That's the idea.

      I don't expect you to believe this (though I kind of like the explanation) but that the logic invokes the SMT and the transparency assumption between grammars and parsers to run.

    2. Let me see if I get this. On the basis of standard non-OC examples, we suppose that pro is possible in Spec of non-finite TP. This appears to be contradicted by the fact that (3b) is unacceptable. So, in response, we suppose that (3b) is actually grammatical but non-parsable (I take it this means it has something like the same status as a triple-center-embedded sentence). Is that right? If so ... dumb question ... why is the original standard non-OC example not similarly non-parsable?

    3. Because the ec is generally within and island and so cannot be analyzed as a trace:
      (i) John said that [[ec washing himself] was imperative]
      This is the typical case of NOC. Note the ec is within an island and so not analyzable by either the grammar or the parser as a trace of I-merge. Hence, no competition and the parser can drop a pro there.

      The problem for any grammatical theory is that the diagnostics of OC are a proper subset of those for NOC. Hence IF OC did not pre-empt NOC you would never "see" OC. Hence there needs to be some economy condition favoring OC when it is available. Of course, the MTC tells you that one place where it is NOT available is inside an island. Here NOC will be possible if it is licensed by something like a pro. This has the nice property of favoring OC outside island contexts, and this seems roughly correct.

      Note that there is still some issues concerning what to do with the arb interpretation of pro (John thinks that washing oneself is important), though I would suspect that these too are pronouns, just indefinite ones that distribute more or less the way 'one' does. But these are orthogonal to the proposal above.

      Hope this helps.