What is causing this error? Coefficients not defined because of singularities
The issue is perfect collinearity. Namely,
spring + summer + autumn + winter == 1
small + medium + large == 1
low_flow + med_flow + high_flow == 1
Constant term == 1
By this I mean that those identities hold for each observation individually. (E.g., only one of the seasons is equal to one.)
So, for instance, lm
cannot distinguish between the intercept and the sum of all the seasons' effects. Perhaps this or this will help to get the idea better. More technically, the OLS estimates involve a certain matrix that is not invertible in this case.
To fix this, you may run, e.g.,
model_1 <- lm(S ~ A + B + C + D + E + F + G + spring + summer + autumn + small + medium + low_flow + med_flow, data = trainOne)
Also see this question.
@JuliusVainora has already given you a good explanation of how the error occurs, which I will not repeat. However, Julius' answer is only one method and might not be satisfying if you don't understand that there really is a value for cases where winter = 1, large=1 and high_flow=1. It can readily be seen in the display as the value for "(Intercept)". You might be able to make the result more interpretable by adding +0
to your formula. (Or it might not, depending on the data situation.)
However, I think that you really should re-examine how your coding of categorical variables is done. You are using a method of one dummy variable per level that you are copying from some other system, perhaps SAS or SPSS? That's going to predictably cause problems for you in the future, as well as being a painful method to code and maintain. R's data.frame function already automagically creates factor
's that encode multiple levels in a single variable. (Read ?factor
.) So your formula would become:
S ~ A + B + C + D + E + F + G + season + size + flow