Python is an integer and the data overflow Numpy

A classmates someone sent me a shot, and asked why there was a negative result?

Read the map, my first feeling is data overflow. Data exceeds the maximum that can be expressed, there will be weird results.

Then, he continued to send Zhang map, content is print (100000 * 208378), it is a direct Print E on the map [0] * G [0], the result is 20837800000, which is the correct result.

So the new question is: If the data on the map overflowed, why did not directly multiplied by the number of overflow?

Since I have been ignoring the rules of representation of data (integer limit is how much?), But also for Numpy not know much, but also saw the results in Figure wrong, mistaken for each of the data is wrong, so I do not answer it.

Finally, after studying the group of some discussion, I finally understand how it is, so this the relevant knowledge points to be combed.

Before the official start, first summarize the topic will be drawn on the map:

  • The upper limit of integers in Python 3 is how much? Python 2 it?
  • The upper limit Numpy integer is the number? An integer overflow how to do?

On the first question, take a look at Python 2, it has two integers:

  • One is a short integer, or an integer that is often said, is represented by int, there is a built-in function int (). Its limited size, can sys.maxint()view (depending on the platform is a 32-bit or 64-bit)
  • One is a long integer, i.e., the size of the infinite integer, is represented by long, there is a built-in function long (). After the number is written on the increase lowercase letters L or l, such as 1000L

When an integer outside the range of short integers, it automatically uses a long integer. For example, to print 2**100, the end results will be represented by the letter L which is a long integer.

But in Python 3, the situation is different: it is only one built-in integer, it expressed as int, formally Python short integer, but in fact it represents an infinite range of behavior is more like a long integer. No matter how much the number, the end of the letter L do not need to make a distinction.

That is, Python 3 realizes two integer representations, users no longer need to distinguish themselves, all to the underlying demand processing.

理论上,Python 3 中的整数没有上限(只要不超出内存空间)。这就解释了前文中直接打印两数相乘,为什么结果会正确了。

PEP-237(Unifying Long Integers and Integers)中对这个转变作了说明。它解释这样做的 目的:

这会给新的 Python 程序员(无论他们是否是编程新手)减少一项上手前要学的功课。

Python 在语言运用层屏蔽了很多琐碎的活,比如内存分配,所以,我们在使用字符串、列表或字典等对象时,根本不用操心。整数类型的转变,也是出于这样的便利目的。(坏处是牺牲了一些效率,在此就不谈了)

回到前面的第二个话题:Numpy 中整数的上限是多少?

由于它是 C 语言实现,在整数表示上,用的是 C 语言的规则,也就是会区分整数和长整数。

有一种方式可查看:

import numpy as np

a = np.arange(2)
type(a[0])

# 结果:numpy.int32

也就是说它默认的整数 int 是 32 位,表示范围在 -2147483648 ~ 2147483647。

对照前文的截图,里面只有两组数字相乘时没有溢出:100007*4549、100012*13264,其它数据组都溢出了,所以出现奇怪的负数结果。

Numpy 支持的数据类型要比 Python 的多,相互间的区分界限很多样:

截图来源:https://www.runoob.com/numpy/numpy-dtype.html

要解决整数溢出问题,可以通过指定 dtype 的方式:

import numpy as np

q = [100000]
w = [500000]

# 一个溢出的例子:
a = np.array(q)
b = np.array(w)
print(a*b)  # 产生溢出,结果是个奇怪的数值

# 一个解决的例子:
c = np.array(q, dtype='int64')
d = np.array(w, dtype='int64')
print(c*d) # 没有溢出:[50000000000]

好了,前面提出的问题就回答完了。来作个结尾吧:

  • Python 3 极大地简化了整数的表示,效果可表述为:整数就只有一种整数(int),没有其它类型的整数(long、int8、int64 之类的)
  • Numpy 中的整数类型对应于 C 语言的数据类型,每种“整数”有自己的区间,要解决数据溢出问题,需要指定更大的数据类型(dtype)

公众号【Python猫】, 本号连载优质的系列文章,有喵星哲学猫系列、Python进阶系列、好书推荐系列、技术写作、优质英文推荐与翻译等等,欢迎关注哦。

Guess you like

Origin www.cnblogs.com/pythonista/p/11503117.html