Walkthrough: Eyetracking data w/empirical logit regression
Wed 12 Sep, 2007 at 1:35 PM
The subset of the design of Kronmüller and Barr (2007) that we will be looking at had a 2x2 design, with Speaker (Same, Different) crossed with Cognitive Load (None, Load). Essential background details for this walkthrough can be found in the MLR paper. You'll also need to download the files from my previous post.
First, we need to load in the data.
> library(lme4)
> load("bySubj.bin")
> load("byItem.bin")
The objects bySubj and byItem contain the data aggregated by Subject and by Item respectively.
In the experiment, each participant contributed data from three trials to each cell of the design. In other words, there were three trials in each of the four cells, for a total of twelve. The data was prepared by aggregating over 50 ms time slices (3 frames of eye data) and over all of the trials in a given condition within each subject (or item, for the analysis by item). Thus, each 50 ms bin consisted of 12 frames of eye data.
For example, take a look at some of the data for subject 30.
> bySubj[bySubj$SubjID==30,c(1:4,8,13:14)]
SubjID AggID SpeakerT CogLoadT Bin y N
1 30 241 Same None 3 0 12
2 30 241 Same None 2 0 12
3 30 241 Same None 1 0 12
4 30 241 Same None 0 0 12
5 30 243 Diff None 0 3 12
6 30 243 Diff None 1 3 12
7 30 243 Diff None 2 5 12
8 30 243 Diff None 3 4 12
9 30 245 Same Load 3 6 12
10 30 245 Same Load 2 7 12
11 30 245 Same Load 1 3 12
12 30 245 Same Load 0 3 12
13 30 247 Diff Load 0 3 12
14 30 247 Diff Load 1 3 12
15 30 247 Diff Load 2 3 12
16 30 247 Diff Load 3 3 12
The variable "y" is the number of frames in the 50 ms window for which the gaze fell within the bounds of the target object; The variable "N" is the total number of frames within each bin. We can compute the empirical logit for each bin using the following formula:

We will create a new variable elogS that will hold these values. We also need to compute regression weights for each case (because the variance depends on the mean, and we are using linear regression on a transformed variable rather than true logistic regression). We compute weights using the formula:

> elogS <- log((bySubj$y + .5) / (bySubj$N - bySubj$y + .5))
> wts <- 1/(bySubj$y + .5) + 1/(bySubj$N - bySubj$y + .5)
The model that we are fitting is a simple line; the regression analysis will estimate the slope and intercept for the line, as well as how the various factors (and their interaction) modulate the slope and intercept terms (see the MLR paper for details).
So the next step is to fit the full model, including all random effects for the Subject analysis:
> bySubj.lmer = lmer2(
elogS ~
Spkr + CLoad + SbyL +
t1 + t1S + t1L + t1SL +
(1 + t1 | SubjID:AggID) + (1 + t1 | SubjID),
data=bySubj, weights=1/wts)
Let's break this command down into its basic parts. First, bySubj.lmer is the name of the variable that will be created by the model fitting procedure lmer2. (We are using development version of lmer, lmer2, because lmer does not take the weights into account.) The first argument to the lmer function is a formula, specifically elogS ~ Spkr + CLoad + SbyL + t1 + t1S + t1L + t1SL + (1 + t1 | SubjID:AggID) + (1 + t1 | SubjID). Note the tilde (~) where an equal sign would normally be; this is the conventional way to write a formula in R. The terms up to the first term enclosed in parentheses are for the fixed effects; the terms in parentheses represent the random effects.
Let's focus first on the fixed effects. Recall that we are basically using lmer2 to estimate the slope and intercept of the best fitting line. Although an intercept term is not in the equation, it is there implicitly by default. The terms Spkr, CLoad, and SbyL are effect-coded predictors that represent the condition that a given observation is in. For example, the Spkr variable is given a value of -.5 for the Same Speaker condition and +.5 for the Different Speaker condition. Thus, the parameter that is estimated for this variable will represent the mean difference between the two conditions for the intercept term, collapsed over the other variables in the design. In other words, it represents what would be a main effect of Speaker in an ANOVA (except that it is testing the main effect only for the intercept term). The CLoad variable was also coded using a (-.5, +.5) scheme, and thus the parameter that is estimated would correspond to a main effect of Cognitive Load (on the intercept). Finally, the interaction term was calculated as sgn(Spkr)*sgn(CLoad)*.5, where the function sgn is the sign (-1 or +1) of the variable, so that the parameter estimated for this variable would correspond to an interaction term in an ANOVA (specifically, the difference between the means of the two diagonals in the design).
The exact same logic applies for the t1 terms in the equation, except that these terms test for significant main and interaction effects on the slope. The term t1 variable codes time (in seconds) from the onset of the analysis window (300 ms after word onset). The parameter estimated for this variable will correspond to the grand mean slope (collapsed over all conditions). The parameter estimated for t1S represents the main effect of Speaker on the slope. This variable was calculated very simply as t1*Spkr. Variable t1L was computed similarly (t1*CLoad) so that the parameter estimated for it will represent the main effect of Cognitive Load; and finally the parameter estimated for t1SL will correspond to the Speaker by Load interaction.
Now, what about the terms for the random effects? Note that we have aggregated together all of the trials within a given condition, and broken up the time variable into a series of 50 ms bins. The bins for a given aggregated condition (henceforth "aggregate") were assigned an ID number (to represent that they are all from the same condition for the same subject). The precise number that is given to a particular "aggregate" does not matter; what matters is that the number given to all the bins in a given aggregate is unique. This is a way of telling lmer that these bins belong together. And we need to do that, because the bins within an aggregate will tend to be correlated with one another (moreso than the bins from different aggregates) and we need to take this correlation into account. Moreover, they are nested within each Subject, and to denote this nesting, we use the syntax SubjID:AggID. This is level two of our three-level model.
The syntax (1 + t1 | SubjID:AggID), while strange at first, makes perfect sense once one realizes that it is telling the function to allow both intercepts and slopes to vary randomly across individual aggregates. The pipe symbol | has the meaning "conditional on"; thus the intercept and slope terms are "conditional on" random variation over aggregates. Thus, if we only wanted the intercept, but not the slope, to vary across aggregates, we would use the syntax (1 | SubjID:AggID).
The term (1 + t1 | SubjID) represents level 3 of our model—thus, each subject has unique effects on the intercept and on the slope that are constant across all aggregates for that subject.
Finally, the argument data=bySubj tells lmer to look for the variables inside the data frame bySubj. If the variables are not found there, it will look outside the data frame. The argument weights=1/wts performs a weighted regression by the weights that we computed above.
Once you enter the code above and press return, R will start performing the required computations and will not return a prompt until the computations are completed. Sometimes (though not in the current case) this can take a very long time. It's often nice to know that there's still something going on, so you can type the command options(verbose=TRUE) prior to running the command and it will give you some convergence statistics, to reassure you that it's still working on a solution.
Now let's look at the output:
> summary(bySubj.lmer)
Linear mixed-effects model fit by REML
Formula: elogS ~ Spkr + CLoad + SbyL + t1 + t1S + t1L + t1SL + (1 + t1 | SubjID:AggID) + (1 + t1 | SubjID)
Data: bySubj
AIC BIC logLik MLdeviance REMLdeviance
2469 2536 -1220 2437 2441
Random effects:
Groups Name Variance Std.Dev. Corr
SubjID:AggID (Intercept) 1.363521 1.16770
t1 39.474848 6.28290 -0.452
SubjID (Intercept) 0.025466 0.15958
t1 0.191524 0.43763 1.000
Residual 0.249136 0.49914
Number of obs: 896, groups: SubjID:AggID, 224; SubjID, 56
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.25874 0.08417 -14.955
Spkr 0.32142 0.16284 1.974
CLoad -0.02784 0.16284 -0.171
SbyL -0.05813 0.16284 -0.357
t1 2.42557 0.48479 5.003
t1S -2.27101 0.96249 -2.360
t1L -0.33024 0.96249 -0.343
t1SL 0.43564 0.96252 0.453
Correlation of Fixed Effects:
(Intr) Spkr CLoad SbyL t1 t1S t1L
Spkr -0.005
CLoad -0.001 0.004
SbyL 0.004 -0.001 -0.005
t1 -0.440 0.006 0.000 -0.006
t1S 0.006 -0.490 -0.006 0.000 -0.006
t1L 0.000 -0.006 -0.490 0.006 -0.001 0.007
t1SL -0.005 0.000 0.006 -0.490 0.007 -0.001 -0.006
There's a lot of information here, some of which I hope to discuss in later posts. But let's proceed by looking at the information about the random effects. The standard deviation for each effect tells you how much the observations vary over aggregates and over subjects. There are separate estimates for the intercept and slope terms for each source of random variation.
What we are more interested in here are the parameter estimates for the fixed effects. We see that the intercept term, which is in fact the grand mean for the intercept term, was about -1.26. Given the coding of the response variable, this means that participants were about 3.53 times more likely (exp(1.38)=3.97) to be looking somewhere other than the target at the onset of the analysis window (this makes sense given that there were two other pictures on the screen and furthermore, that participants would often begin the trial looking at the center of the screen). The significance test for this parameter estimate is not interesting, because it tests the null hypothesis that the Intercept=0 (i.e., that listeners are equally likely to be fixating the target as they were to be doing something else).
We also see that the parameter estimate for the grand mean of the slope term is 2.43 logits/sec. This means that for each 100 ms that passed, listeners became about 1.3 times more likely to fixate the target (exp(2.43/10)=1.28). It makes sense that they look increasingly at the target given that they are listening to speech describing this object.
What we are most interested in finding out, however, is whether the intercept and slope terms were modulated by main or interaction effects. By using an effect coding scheme, we have set things up so that the test of each parameter estimate against the null hypothesis that it is equal to zero gives us a test of main and interaction effects. Note that we are given t values for each parameter estimate-- but no degrees of freedom! The degrees of freedom are not well-defined in a multilevel model. So, one option is to use the normal distribution to look up p-values, which is generally OK when the estimates are based on lots of observations (as is typically the case in psycholinguistic experiments). This gets us pretty close, but Baayen, Davidson, and Bates (in press) show us that we can get better estimates using the MCMC function in R. I confess to not understanding what MCMC is—it has something to do with Bayesian statistics and Markov Chain Monte Carlo random walks through parameter space. In any case Baayen et al. claim that it gives less biased estimates of the p-values. Baayen has implemented it using the function pvals.fnc() in the languageR package.
[update Mar 14, 2007]. The MCMC function works with lmer but not with lmer2 objects. There is a way to get the MCMC values for lmer2, but the values seem a bit wacky; I wonder if it has to do with the weighting of the observations in the lmer2 regression. Anyway-- I would suggest testing for significance using the t values instead, by comparing them against a critical value of t=2.00; that is probably the best way of doing things for now. With this criterion, we have a significant effect of Speaker on the slope, p<.05, and a marginal effect of Speaker on the intercept (p < .10).
OK, so how do we interpret these findings? As I discuss in my paper, effects that interact with a time variable are rate effects; effects that interact with the intercept are anticipatory effects, the result of (probably strategic) processing that takes place prior to the onset of the targeted referring expression. When the speaker was the same speaker, the slope was more strongly positive, indicating that the likelihood of looking at the target increased faster in that case. This is likely to be a speaker-specific priming effect, given that it was not modulated by load. The main effect of Speaker on the intercept, in contrast, is likely to represent strategic anticipation that has nothing to do with the processing of the speech. Interestingly, the effect of Speaker on the intercept goes in the opposite direction (it's negative) compared to its effect on the slope (which is positive). I suggest in the paper that listeners were more likely to look at the target in the Different Speaker than in the Same Speaker condition because in the latter condition, they often anticipated that the speaker would refer to something new.
Item Analysis
We follow the exact same steps for the item analysis.
> elogI <- log((byItem$y + .5) / (byItem$N - byItem$y + .5))
>
> wtsI <- 1/(byItem$y + .5) + 1/(byItem$N - byItem$y + .5)
>
> byItem.lmer <- lmer2(
elogI ~
Spkr + CLoad + SbyL +
t1 + t1S + t1L + t1SL +
(1 + t1 | ItemID:AggID) + (1 + t1 | ItemID)
,data=byItem, weights=1/wtsI)
One complication is that the model fitting procedure does not converge. It terminates with the warning Estimated variance-covariance for factor 'ItemID' is singular. What (I think) that means is that the variance due to ItemID is close to zero for at least one of the terms in the equation. One way to get the algorithm to converge is to look at the partially converged output, see which random effect for ItemID is close to zero, and eliminate it. Doing so suggests getting rid of the ItemID random effect for the t1 term in the equation (estimated variance was 6.1478e-11). We then re-fit the model.
> byItem1.lmer <- lmer2(
elogI ~
Spkr + CLoad + SbyL +
t1 + t1S + t1L + t1SL +
(1 + t1 | ItemID:AggID) + (1 | ItemID)
,data=byItem, weights=1/wtsI)
There's a lot more that could be said about this example, and I hope to elaborate on the output from R as well as on model selection in future posts. Comments/questions on this walkthrough are welcome.