There are three pandas traversal visit to France.
- iterrows (): returns the index of the row and project in separate variables, but significantly slower
- itertuples (): faster than .iterrows (), it will be returned along with the index line item, ir [0] is the index
- zip: the fastest, but you can not access the index of the line
df= pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})
0.for i in df: not a way through the rows of
for i in df: print(i)
Because the way for in df official line is not directly traverse so we studied the following method.
1.iterrows (): returns the index of the row and project in separate variables, but significantly slower
df.iterrows () returns is actually a tuple => (index, Series)
count=0 for i,r in df.iterrows(): print(i,'-->',r,type(r)) count+=1 if count>5: break
2.itertuples (): faster than .iterrows (), it will be returned along with the index line item, ir [0] is the index
count=0 for tup in df.itertuples(): print(tup[0],'-->',tup[1::],type(tup[1:])) count+=1 if count>5: break
3.zip: the fastest, but you can not access the index of the line
count=0 for tup in zip(df['a'], df['b']): print(tup,type(tup[1:])) count+=1 if count>5: break
4. Performance Comparison
df = pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)}) import time list1 = [] start = time.time() for i,r in df.iterrows(): list1.append((r['a'], r['b'])) print("iterrows耗时 :",time.time()-start) list1 = [] start = time.time() for ir in df.itertuples(): list1.append((ir[1], ir[2])) print("itertuples耗时:",time.time()-start) list1 = [] start = time.time() for r in zip(df['a'], df['b']): list1.append((r[0], r[1])) print("zip耗时 :",time.time()-start)