Posted by on Thu 27 Dec 2007 at 1:49 PM:
I didn't quite follow the effects coding scheme you used. What is the relation of .5/-.5 to the log odds? How would you extend this to a categorical variable with more levels (e.g. 3)?
Posted by Dale Barr on Mon 07 Jan 2008 at 11:26 PM:
The -.5, .5 scheme is not related to log odds; it is a coding scheme used to give results directly comparable to what you would get from an ANOVA.
The effect codes, unlike dummy codes, will make the "intercept" in the regression model correspond to the grand mean, and the parameter estimates for each predictor will reflect the deviation from the grand mean, i.e., the main effect in an ANOVA.
If you code the 2x2 using dummy codes, the parameter estimates for variable A will correspond to the "simple effect" of A at the level of B set to 0, and that for B will be the "simple effect" of B at the level of A set to 0.
The dummy coding and effect coding schemes will give you different results whenever there is an interaction; I recommend always using effect coding and then following up a significant interaction with a test for simple effects using dummy coding. In this way, results are interpretable according to the very familiar main effects + interactions scheme.
I'll have more to say about this (and about what to do when there are more than 2 levels) later in another entry.
Thanks for your question!
Posted by on Fri 29 Feb 2008 at 7:21 AM:
Your example is based on an experimental design in which you have expectation of fixations on a single target region. However, if I have two different concurrent target objects (both likely to be observed, no preferential choice) in the design that I would like to model in a multilinear logistic regression, what should I do?
If you have in the dataframe two regions of interest, let say "yellow" and "red". How and on which of the two objects is the intercept calculated? It seems to me that R, to create the baseline, takes always the values that are connected to the first available variable name. So, if red is on top of the dataframe, that will be used to calculate the baseline. Am i wrong? Can you run multilevel analysis in a kind of setting like this?
Thanks a lot
Posted by Dale Barr on Fri 14 Mar 2008 at 4:37 PM:
In an ideal world, it would be possible to fit a multilevel multinomial logistic regression model to your data, which can deal with more than two regions of interest. However, we are not quite there yet, although I'm confident we'll get there eventually.
In lieu of a better solution, what some people have done is to put the identity of the region as a predictor in the model. I find this practice questionable because it ignores the fact that looks to region 1 and region 2 within the same trial are negatively correlated, and this negative correlation is not taken into account by the model, leading to the underestimation of the standard error and inflating the Type I error rate.
But there is another solution, and that is to run separate analyses, each comparing two regions. So, if you want to compare "yellow" to "red", you can code the response as 1 for yellow and 0 for red, and drop all the frames from the data set where the participant is looking somewhere else. There's a paper by Scheepers and colleagues (I think the citation is Arai, Van Gompel & Scheepers (2007)) where they do an analysis using a 'log ratio' that is similar to what I am describing here. Once you have the 1s and 0s, then you can do an empirical logit transformation, aggregating up to the Subject(or Item)xCondition level. If you want to compare "red" to something else, then you can do this in a separate analysis. That doesn't make it possible to compare across analyses, for example, to show that the slope for yellow < red < blue; for that we'll have to wait for mixed-effect multinomial logit models with robust standard errors to become more readily available. Wish I had a better solution!
Posted by on Sun 13 Apr 2008 at 10:24 AM:
It's an interesting article.