提升算法的sklearn-kit的API

Turns Out…

We can see from the scores above that our Naive Bayes model actually does a pretty good job of classifying spam and “ham.” However, let’s take a look at a few additional models to see if we can’t improve anyway.

Specifically in this notebook, we will take a look at the following techniques:

Another really useful guide for ensemble methods can be found in the documentation here.

These ensemble methods use a combination of techniques you have seen throughout this lesson:

  • Bootstrap the data passed through a learner (bagging).
  • Subset the features used for a learner (combined with bagging signifies the two random components of random forests).
  • Ensemble learners together in a way that allows those that perform best in certain areas to create the largest impact (boosting).

In this notebook, let’s get some practice with these methods, which will also help you get comfortable with the process used for performing supervised machine learning in Python in general.

Since you cleaned and vectorized the text in the previous notebook, this notebook can be focused on the fun part - the machine learning part.

This Process Looks Familiar…

In general, there is a five step process that can be used each time you want to use a supervised learning method (which you actually used above):

  1. Import the model.
  2. Instantiate the model with the hyperparameters of interest.
  3. Fit the model to the training data.
  4. Predict on the test data.
  5. Score the model by comparing the predictions to the actual values.

Follow the steps through this notebook to perform these steps using each of the ensemble methods: BaggingClassifier, RandomForestClassifier, and AdaBoostClassifier.

Step 1: First use the documentation to import all three of the models.

发布了137 篇原创文章 · 获赞 87 · 访问量 164万+

猜你喜欢

转载自blog.csdn.net/studyvcmfc/article/details/105464065
今日推荐