Do I have to cut y (prediction) column from Pandas dataframe with Scikit-learn?

qalis :

I've split my Pandas DataFrame into train_X and train_y parts, where train_X has all N columns, and train_y has only N-th column, depicting the variable that I want to predict. Currently I'm doing:

train_X.drop("N-th column name", axis=1, inplace=True)
model = SomeSklearnModel()
model.fit(train_X, train_y)

Do I have to do it "by hand" (i. e. using drop() on train_X), or can I just do the 3rd line and Scikit-learn will "know" which column train_y is and not use it for model training (only for checking results)?

Chris A :

You must declare X and y explicitly when calling fit on a sklearn estimator. Generally by the time you're ready to split your data into training and testing sets, X should include model features only, so should not include your target y.
There are many ways to do it, but here a couple of common ways using the iris dataset as an example:

# Setup
df_iris = pd.DataFrame({'sepal_length': [5.0, 4.8, 5.8, 5.7, 4.5, 6.0, 6.3, 4.8, 5.6, 6.4],
                        'sepal_width': [3.2, 3.4, 2.8, 4.4, 2.3, 3.0, 2.5, 3.4, 3.0, 2.8],
                        'petal_length': [1.2, 1.6, 5.1, 1.5, 1.3, 4.8, 5.0, 1.9, 4.5, 5.6],
                        'petal_width': [0.2, 0.2, 2.4, 0.4, 0.3, 1.8, 1.9, 0.2, 1.5, 2.1],
                        'target': ['setosa', 'setosa', 'virginica', 'setosa', 'setosa','virginica',
                                   'virginica', 'setosa', 'versicolor', 'virginica']})

If your target y is the "n-th" column of "n", you can use iloc slicing:

X = df_iris.iloc[:, :-1]
y = df_iris.iloc[:, -1]

Another way would be to use pop which both drops and returns the column for assignment:

X = df_iris.copy()
y = X.pop('target')

Or using your own method with drop:

X = df_iris.drop('target', axis=1)
y = df_iris['target']

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=216202&siteId=1
Recommended