1. Problem description
An error is reported when building a logistic regression model. The model code is as follows:
import statsmodels.api as sm
formula = "y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16+x17+x18+x19"
model = sm.Logit.from_formula(formula, data = train_data)
The error is as follows:
Two, specific causes
ValueError error means that the value of the input function is wrong, maybe the parameter is wrong.
The general meaning of the error message is: The value of endog may have multiple values. The reason for this may be that endog is non-numeric data, and endog refers to the dependent variable, so it may be that the value of the dependent variable during modeling is not numeric . Therefore, the solution is to coerce the non-numeric type into a numeric type.
3. Solve the problem
The variables in "y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16+x17+x18+x19" are all in the train_data dataset , so use train_data.info() to view the relevant information of the train_data dataset, the results are as follows
Therefore, you need to convert the variable y from object type to float64 type, and check the result again as follows
train_data.y=np.array(train_data.y,dtype=np.float) #转换类型
train_data.info() #查看信息
Running the code of the logistic regression model again will not report an error~~~
nice(✿◠ ‿ ◠)