Linear model (lm) when dependent variable is a factor/categorical variable?

Linear regression does not take categorical variables for the dependent part, it has to be continuous. Considering that your AccountStatus variable has only four levels, it is unfeasible to treat it is continuous. Before commencing any statistical analysis, one should be aware of the measurement levels of one's variables.

What you can do is use multinomial logistic regression, see here for instance. Alternatively, you can recode the AccountStatus as dichotomous and use simple logistic regression.

Sorry to disappoint you, but this is just an inherent restriction of multiple regression, nothing to do with R really. If you want to learn more about which statistical technique is appropriate for different combinations of measurement levels of dependent and independent variables, I can wholeheartedly advise this book.

Expanding a little bit on @MaximK's answer: multinomial approaches are appropriate when the levels of the factor are unordered: in your case, however, when the measurement level is ordinal (i.e. ordered, but the distance between the levels is unknown/undefined), you can get more out of your data by doing ordinal regression, e.g. with the polr() function in the MASS package or with functions in the ordinal package. However, since ordinal regression has different/more complex underlying theory than simple linear regression, you should probably read more about it (e.g. at the Wikipedia article linked above, or in the vignettes of the ordinal package, or at the UCLA stats consulting page on ordinal regression, or browsing related questions on CrossValidated.

If you can give a numeric value to the variables then you might have a solution. You have to rename the values to numbers, then convert the variable into a numeric one. Here is how:

library(plyr)
my.data2$islamic_leviathan_score <- revalue(my.data2$islamic_leviathan,
               c("(1) Very Suitable"="3", "(2) Suitable"="2", "(3) Somewhat Suitable"="1", "(4) Not Suitable At All"="-1"))

my.data2$islamic_leviathan_score_1 <- as.numeric(as.character(my.data2$islamic_leviathan_score))

This revaluates the potential values while transforming the variable as numeric ones. The results I get are consistent with the original values contained in the dataset when the variables are as factor variables. You can use this solution to change the name of the variables to whatever you may like, while transforming them to numeric variables.

Finally, this is worth doing because it allows you to draw histograms or regressions, something that is impossible to do with factor variables.

Hope this helps!

Linear model (lm) when dependent variable is a factor/categorical variable?

Tags:

R

R Factor

Lm

Categorical Data

Related

Recent Posts