Pandas will teach you how to speed up the iteration rate of 150 times?

The full text 1455 words, when learning is expected to grow 5 Fenzhong

Source: Pexels

 

Let's face it, Python's speed when compared to the C language or the Go language, really caused a lot of words.

 

This allows the author for some time, has been skeptical of the ability to quickly deal with the task of Python.

 

Currently, I try to be scientific data in the Go language - which is possible - but the operation is simply not as pleasant in Python, mostly due to static characteristics of language and scientific data are mostly exploratory areas .

 

Go not to say that a complete rewrite of language solutions can not improve performance, but this is the topic for another article.

 

So far, at least it ignored Python can handle the task this capability more quickly. I have been suffering from short-sighted it - this is a manifestation of that when you see only one solution, the completely ignore the existence of other programs Syndrome. We not only believe this situation themselves.

 

This is the reason I want to outline today how to make Pandas daily work faster and more pleasant. More specifically, this example will focus on Iterative between rows, and perform data operations in the process. So without further ado, we went into the topic.

 

Source: Pexels

 

Make a data set

 

The views discussed in the easiest way to clear a single data frame is declared objects which integer ranging from 1 to 100,000:

 

 

Really do not need any more complicated thing to solve the speed problem Pandas. To verify all goes well, here are the first few lines and overall shape of the data set:

 

 

Well, preparatory work has been done, now take a look at how to traverse the line and how not to traverse the data frame. First, how not to make a selection.

 

Here is what you should not do

 

Ah, I have been in use (and overuse) so many iterrows () method. It is very slow by default, but you know the reason I bothered to look for alternatives (short-sighted).

 

The method of traversing the data box to prove that you should not use iterrows (), I'd make a quick demo - declare a variable and is initially set to the current value of 0-- then press the Values ​​property of each iteration increments.

 

If you want to know the magic %% time function returns the number of seconds required for the cell to complete all operations / milliseconds.

 

Look at how the function is run:

 

 

你现在可能会想,用15秒遍历100000行并递增一些外部变量的值并不算多。但事实上是——请看下一部分的阐述原因。

 

以下是你应该做的事

 

现在有一个神奇的方法能进行挽救——itertuples()。顾名思义,itertuples()循环遍历数据框的行,然后返回一个命名元组。这就是不能用括号[]访问这些值,而是需要使用.符号的原因。

 

现在将演示与几分钟前相同的示例,但使用的是itertuples()方法:

 

 

瞧瞧!使用itertuples()进行同样的运算,速度快了约154倍!现在想象一下你的日常工作场景,你正在处理上百万条行——itertuples()可以帮你节省大量时间。

 

来源:Pexels

 

在这个简单的例子中,我们已经见识到对代码进行的小小改动就能对整体结果产生的巨大影响。

 

这不意味itertuples()在每个场景下都会比iterrows()快150倍,但在某种程度上这确实意味着每次都会快一些。

 

感谢阅读,希望大家有所收获!

留言 点赞 关注

我们一起分享AI学习与发展的干货
欢迎关注全平台AI垂类自媒体 “读芯术”

(添加小编微信:dxsxbb,加入读者圈,一起讨论最新鲜的人工智能科技哦~)

发布了820 篇原创文章 · 获赞 2764 · 访问量 45万+

Guess you like

Origin blog.csdn.net/duxinshuxiaobian/article/details/104534317