Of buckles and buckets: Experimentally controlling anticipatory effects in a lexical competition study
Tue 09 Oct, 2007 at 11:18 PM
Visual world studies investigate hypotheses of the form "Does constraint X influence the processing of linguistic fragment Y?" One necessary feature of such studies is that information corresponding to constraint X must be presented temporally prior to Y. Consequently, there may be effects of X on a participant's looking behavior independently of whether X imposes any constraint on the processing of Y. These anticipatory effects, when not appropriately controlled, can cloud the interpretation of results.
In the MLR paper, I discussed a statistical approach to controlling anticipatory effects; it is also possible to control them experimentally. Here I discuss (as yet unpublished) results from a study investigating whether listeners use knowledge about a speaker's perspective to constrain reference resolution. I show how the inclusion of an appropriate experimental baseline can be used to help untangle linguistic from nonlinguistic effects. The analysis is much simpler and more transparent then the curve fitting exercises that I've explored in the MLR paper or in previous postings.
The data files can be downloaded here [format: zip (Windows) or tarball (UNIX, Mac OS)].
This entry and the corresponding dataset were originally posted on Oct. 9th but were revised on Oct. 18th after I discovered an error in the original data set.
Participants saw displays containing four pictures (such as one of the two shown at left) and heard a speaker refer to one of the four pictures, the "target" picture (e.g., "click on the bucket"). There was also a "critical" picture that in one condition was a competitor—in other words, the first syllable or so of the name of the depicted object overlapped phonologically with the name of the target. See the right panel in the above figure, where the critical object is highlighed in red (of course, this highlighting was not seen by the subject). It was expected that the presence of a competitor would temporarily interfere with the identification of the target. In a second condition, the critical object was a noncompetitor (e.g., stepladder; see the left panel). In addition to this Competitor manipulation, I also manipulated whether the listener believed that the speaker knew the identity of this critical picture (IV of Ground). In the Privileged condition, the listener thought that the speaker did not know about this picture; in the Shared condition, the listener believed that the speaker did know about it. Obviously, speakers can't refer to things that they don't know about. My question was whether listeners would be able to use this knowledge to constrain the mapping of words to referents, or whether this mapping process was largely automatic. If they can, then a competitor that is privileged should produce less competition than one in that is shared. If they can't, then the competition should be roughly equivalent.
Now, telling listeners that a speaker doesn't know about some picture can make them look at it less than other pictures that the speaker is assumed to know about. So listeners may be less likely overall to look at the critical object when it's privileged, relative to when it's shared, regardless of whether or not it is a competitor. For example, they may attempt to restrict their focus of attention to things that are mutually known in anticipation of a forthcoming referring expression. But that by itself wouldn't answer the question of whether this knowledge is used in the referential process. It is possible that the mapping process proceeds relatively automatically and does not use this knowledge to constrain the binding of words to referents. I know that it may sound odd to say that at one level, listeners are taking into account the speaker's perspective, while at another level, they are not. But this kind of dissociation is possible in cognition. Fodor (1983) gives the example of visual illusions to demonstrate this point: we may know that a visual illusion is just an illusion but can't avoid falling for it anyway. I reasoned that something like this kind of automatic processing could underlie the so-called "egocentrism" that Keysar, myself and colleagues have seen in a number of studies (e.g., Keysar, Lin, & Barr, 2003).
The beauty of the design of the current study (which was based on Hanna, Tanenhaus, and Trueswell, 2003) is that it has a built-in control for anticipatory effects of Ground. Any anticipatory advantage for the critical object when it is shared compared to when it is privileged will be present independently of whether the critical object is a competitor or noncompetitor. The logic is to analyze the 'competition' effect, the likelihood of fixating the critical object when it is a competitor relative to when it is a noncompetitor. The competition effect for the privileged condition is given by comparing the privileged competitor to the privileged noncompetitor; the effect for the Shared condition is given by comparing the shared competitor to the shared noncompetitor. These competition effects will be 'free and clear' of any anticipatory effects. Thus, the main prediction was: If listeners can use common ground to constrain lexical processing, the competition effect in the privileged condition should be less than that in the shared condition.
Before going into the analyses, I should mention one additional feature of the task. Listeners were occasionally given a memory test at the end of the trial to see if they could remember the identity of the privileged object. The original motivation for this was as a kind of manipulation check for the Ground variable, to show that listeners were indeed paying attention to what the speaker knew about. I mention this feature because there is a numerical trend in one of the analyses that seems to be the result of the extra attention given to the privileged object for the memory task.
Twenty listeners were included in the study, each of which completed sixteen experimental items (and an equivalent number of fillers). Further details of the experiment will be found in the (hopefully) forthcoming report.
I defined an analysis window spanning from 180 ms after the onset of the target word (e.g., "bucket") until 180 ms after the end of the longest word in the study (which was 736 ms); in other words, a 736 ms window spanning from 180 ms to 916 ms after word onset. For each trial, I calculated the number of frames that the listener looked at the critical object, as well as the total number of frames in the window (which was always 184 frames, given that the sampling rate of the eyetracker was 250 Hz).
Here are what the curves look like for the competitor object in each of the four conditions:

As expected, listeners looked more at shared than at privileged objects at the onset of the analysis window (0 ms in the plot, which is 180 ms after word onset). But they did so equally whether the critical object was a competitor (buckle) or noncompetitor (stepladder).
In the MLR paper, I discussed how one can control for anticipatory effects by modeling the curves. In the current case, we are going to collapse over the time variable, since the experimental design allows us to control for anticipatory effects.
Take a look at the data for the first two subjects:

The variable CompetitorT corresponds to whether the critical object was a competitor (CP) or noncompetitor (NC). The variable Competitor is the effect coded variable for the regression. GroundT corresponds to whether the critical object was privileged (P) or shared (S), and Ground is the corresponding effect coded variable for the regression. CbyG is the effect code for the interaction (sign of Competitor * sign of Ground * .5). Y is the number of frames in the window for which the point of gaze was on the critical object, while N is the total number of frames in the window.
Although for each trial we have made 184 observations, it is certainly not the case that these individual observations are independent (the eye position was sampled once every 4 ms). So we can convert the observations for a given trial into a single independent number by aggregating, either by computing a proportion (Y / N) or via the empirical logit function y'=ln((y+.5)/(n-y+.5)) (McCullagh & Nelder, 1989). I will first show the results from a 'conventional' analysis on the proportional data and then do the same analysis using weighted empirical logit regression (quasi-MLR). These analyses give similar answers, but suggest different numerical trends for the interaction term.
Now we are ready to run the regression. Since we have only aggregated up to the level of the individual trial, we are able to fit a model with crossed effects of Subject and Item, thereby simultaneously accounting for random variance due to both factors (Baayen, Davidson, and Bates, under review).
interf.lmerP <- lmer(
p ~
Competitor + Ground + CbyG + (1 | SubjID) + (1 | ItemID),
data=interf
)
summary(interf.lmerP)
This gives us the following output:

If we have a look at the fixed effects, we are getting t values (actually, equivalent to a Wald z) of 5.56 for the main effect of Competitor (p < .01), 4.48 for the main effect of Ground (p < .01) and .54 for the interaction (p = .59). (Incidentally, we can compute p values from t using the R formula 2*(1-pnorm(t)).) So, this analysis shows no difference in lexical competition for privileged and shared competitors, suggesting that listeners are unable to use this information.
OK, now let's do the quasi-MLR analysis using weighted empirical logit regression. We first need to compute the empirical logit as well as the weights. The we run basically the same linear mixed-effects regression as we did for the proportional case.
interf$elog <- log((interf$Y+.5)/(interf$N-interf$Y+.5))
interf$ewt <- 1/(interf$Y+.5)+1/(interf$N-interf$Y+.5)
interf.lmer2 <- lmer2(
elog ~
Competitor + Ground + CbyG + (1 | SubjID) + (1 | ItemID),
data=interf, weights=1/ewt
)
NOTE: we are using lmer2 rather than lmer because the weights option only works for lmer2.
We get the following output:

Note that we get the main effects as in the previous analysis, and the interaction is still nonsignificant (p = .38). However, the sign of the interaction coefficient has changed; so if anything, it is suggesting a numerical trend in the opposite direction from the proportional analysis.
We can compute the estimated cell means and simple effects as follows:
# extract regression coefficients from lmer2 object
icept <- as.vector(fixef(interf.lmer2)["(Intercept)"])
ceff <- as.vector(fixef(interf.lmer2)["Competitor"])
geff <- as.vector(fixef(interf.lmer2)["Ground"])
cgeff <- as.vector(fixef(interf.lmer2)["CbyG"])
# calculate cell means
Np <- icept + (-.5)*ceff + (-.5)*geff + (.5)*cgeff # noncompetitor, privileged
Cp <- icept + (.5)*ceff + (-.5)*geff + (-.5)*cgeff # competitor, privileged
Ns <- icept + (-.5)*ceff + (.5)*geff + (-.5)*cgeff # noncompetitor, shared
Cs <- icept + (.5)*ceff + (.5)*geff + (.5)*cgeff # competitor, shared
CompEffP <- Cp-Np # simple effect of competitor when it was privileged
CompEffS <- Cs-Ns # simple effect of competitor when it was shared
If we look at the CompEffP and CompEffS variables, we see that the competition effect was actually larger when the critical object was privileged (CompEffP = .85) than when it was shared (CompEffS = .57)! How can this be?
It is informative to graph the estimated cell means from the two regressions side by side.

It may seem anomalous to find a trend toward more competition in the privileged condition, but this trend makes sense given the memory task, which could have drawn additional attention to privileged objects. Consequently, they would be more active in working memory (even though listeners tried to avoid looking at them), and thus more available for referential mapping.
So, the analyses provide the same overall picture but suggest distinct trends for the interaction effect. Which trend should we believe? My view (shared by Florian Jaeger) is that the appropriate scale for interpreting effects on a categorical variable is the log odds scale. Agresti (2002) notes that the 'linear probability model' (i.e., doing linear regressions on proportions) has a 'major structural defect', since the relationship between a predictor variable and the probability of the response is nonlinear. And Jacob Cohen et al. also note that effects of a predictor variable on a categorical variable are multiplicative while linear regression (and ANOVA) assume effects that are additive. So although my views may conflict with conventional practice in visual world research, at least I am in good company.
Agresti, A. (2002). Categorical data analysis (2nd edition). New York: Wiley
Baayen, R. H., Davidson, D. J., & Bates, D. M. (under review). Mixed-effects modeling with crossed random effects for subjects and items. Manuscript under review.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2002). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, N.J.: Erlbaum.
Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press.
Hanna, J., Tanenhaus, M. K., & Trueswell, J. C. (2003). The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language, 49,43-61.
Jaeger, T. F. Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. Manuscript under review.
Keysar, B., Lin, S., & Barr, D. J. (2003). Limits on theory of mind use in adults. Cognition, 89,25-41.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. London: Chapman and Hall.
Posted Tue 09 Oct, 2007 at 11:18 PM | Link | Comments (3)