Comments and "errors" made on "Data Science Technology and Application" (Electronic Engineering 2018 Edition) based on Python3.7 (updated on 2021.04.11)


Preface

In the process of studying the course "Data Science Technology and Application", because the examples in the textbook were based on a Python version that was too old, a small number of functions provided in the textbook could not be used or had their names changed in Python 3.6 and later versions. , this article will be based on Python 3.7 version, correct these "errors" (and after all possible "errors" are sorted out, a unified e-mail will be sent to the editor-in-chief), and at the same time, I will record my learning process here. All comments made


Tip: The following is the text of this article. The following cases are for reference.

1. Correction of "errors"

1.Matplotlib reports an error when drawing a histogram. AttributeError: 'Rectangle' object has no property 'normed'

Coordinates: "Chapter 4 Data Visualization" - "4.2 Visual Data Exploration" - "4.2.1 Drawing Common Graphics" - "5. Histogram" (P73) - "Parameter Description" - "normed"

When using Pandas to draw a histogram, as follows:

values.hist(bins=100, alpha=0.3, color='k', normed=True)

When running the code, the following error message appears:

AttributeError:'Rectangle' object has no property 'normed'

Causes and solutions to problems

This is because this attribute is no longer defined in the new version of the pandas library.

  1. You can remove this attribute, that is,
    values.hist(bins=100, alpha=0.3, color='k')
  2. You can also use the density attribute instead, that is,
    values.hist(bins=100, alpha=0.3, color='k',density=True)

Note: The attribute normed/ density is used to set whether to normalize the histogram. The default value is False.

  1. When density = False, the data values ​​output density map (that is, the data is distributed according to probability density) (see Figure 1 below);
  2. When density = True, the data values ​​output histogram (see Figure 2 below)

Figure 1
Figure II

2. cross_validation is no longer used since scikit-learn version 0.18

Coordinates: "Chapter 5 Machine Learning" - "5.2.3 Regression Analysis Performance Evaluation" - "Example 5-2"

When we enter the code, the following errors will appear:

from sklearn import cross_validation

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
from sklearn.cross_validation import *

ModuleNotFoundError: No module named 'sklearn.cross_validation'

Causes and solutions to problems

cross_validation is no longer used since scikit-learn version 0.18, and will be completely removed in version 0.20.
We only need to change cross_validation to model_selection.

2. Comments




Summarize

Tip: Here is a summary of the article:
For example: The above is what we will talk about today. This article only briefly introduces the use of pandas, and pandas provides a large number of functions and methods that allow us to process data quickly and conveniently.

Guess you like

Origin blog.csdn.net/A_No2Tang/article/details/115606484