[Pandas] Solve the problem that the result of multiplying two positive numbers in pandas is a negative value

I used sklearn to fit the data a few days ago and found that the curve was particularly outrageous. I searched for the reason for a long time. Finally, I found out that it was because I multiplied the features in the code to increase the dimension of the features. However, the two positive After the numbers are multiplied together, they become negative:

In the figure below, xiis the first feature of all data, xjis the second feature, and xijis the result after multiplying the two features.
Please add image description
After testing with the following code, it was found that the reason is that the data is too large, and direct multiplication will not cause errors, but The result of multiplication in pandas is a negative value.

if __name__ == '__main__':
    a = 10000000000
    b = 70000000000
    print("a:{}, b:{}".format(a, b))
    print("a与b直接相乘:\n", a*b)
    pda = pd.DataFrame(np.array([a]))
    pdb = pd.DataFrame(np.array([b]))
    print("通过pandas相乘:")
    print(pda * pdb)
a:10000000000, b:70000000000
a与b直接相乘:
700000000000000000000
通过pandas相乘:
                   0
0 -976274800962961408

The reason for the negative result is that the original data is too large, and the multiplied result exceeds the range of integers that pandas can express, causing overflow. This phenomenon also exists in numpy.

Generally, the organizational types of integers are: int8, int16, int32, int64, etc. If the two numbers multiplied are overflows caused by the integer type of pandas being int32, and the result after multiplication can be expressed in the range of int64 type, then Just change the pandas data type from int32 to int64:

pda = pd.DataFrame(np.array([a]), dtype='int64')
pdb = pd.DataFrame(np.array([b]), dtype='int64')
res = pda * pdb

If the two numbers multiplied are really too large and the result of the multiplication exceeds the range that int64 can represent, we can use normalization to normalize the data to or, and the (-1, 1)normalized (0, 1)data will be multiplied or squared The results will become normal.

Guess you like

Origin blog.csdn.net/qq_41340996/article/details/120571155