Basic code for machine learning

Step 1: Import necessary libraries

```python
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
```

Step 2: Prepare your data

We will use an example dataset with two categories (Positive and Negative) and some text examples for each category. We will use the Pandas library to read the data and look at some sample data.

```python
#Read data
data = pd.read_csv('data.csv')

#View the first five text data
print(data.head())
```

Step 3: Create feature vectors and target variables

We need to convert text into numbers to train our model. Use CountVectorizer to convert text into numeric feature vectors. We also need to convert the target variable (i.e. the categorical label) into a number.

```python
#Use CountVectorizer to create a feature vector
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(data.text)

#Convert the target variable to a number
y = pd.factorize(data.label)[0]
```

Step 4: Split the dataset

We need to split the dataset into training and test sets to evaluate the model while training it. We will use the train_test_split function to achieve this functionality.

```python
#Split the data set into the training set and the test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

Step 5: Train the model

Our model can be trained using MultinomialNB. MultinomialNB is a naive Bayes algorithm commonly used for text classification.

```python
#Training model
clf = MultinomialNB()
clf.fit(X_train, y_train)
```

Step 6: Evaluate the model

We will use accuracy_score to evaluate the accuracy of the model.

```python
#评估模型
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

The complete code looks like this:

```python
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

#Read data
data = pd.read_csv('data.csv')

#View the first five text data
print(data.head())

#Use CountVectorizer to create a feature vector
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(data.text)

#Convert the target variable to a number
y = pd.factorize(data.label)[0]

#Split the data set into the training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Training model
clf = MultinomialNB()
clf.fit(X_train, y_train)

#评估模型
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

Guess you like

Origin blog.csdn.net/qq_71356343/article/details/132921427