pandas df traverse line method

There are three pandas traversal visit to France. 

  1. iterrows (): returns the index of the row and project in separate variables, but significantly slower 
  2. itertuples (): faster than .iterrows (), it will be returned along with the index line item, ir [0] is the index 
  3. zip: the fastest, but you can not access the index of the line
df= pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})

 

 

0.for i in df: not a way through the rows of

for i in df:
    print(i)

 

 Because the way for in df official line is not directly traverse so we studied the following method.

1.iterrows (): returns the index of the row and project in separate variables, but significantly slower 

df.iterrows () returns is actually a tuple => (index, Series)
count=0
for i,r in df.iterrows():
    print(i,'-->',r,type(r))
    count+=1
    if count>5:
        break

 

 2.itertuples (): faster than .iterrows (), it will be returned along with the index line item, ir [0] is the index

count=0
for tup in df.itertuples():
    print(tup[0],'-->',tup[1::],type(tup[1:]))
    count+=1
    if count>5:
        break

 

 3.zip: the fastest, but you can not access the index of the line

count=0
for tup in zip(df['a'], df['b']):
    print(tup,type(tup[1:]))
    count+=1
    if count>5:
        break 

 

 4. Performance Comparison

 

df = pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})
import time
list1 = []
start = time.time()
for i,r in df.iterrows():
    list1.append((r['a'], r['b']))
print("iterrows耗时  :",time.time()-start)

list1 = []
start = time.time()
for ir in df.itertuples():
    list1.append((ir[1], ir[2]))    
print("itertuples耗时:",time.time()-start)

list1 = []
start = time.time()
for r in zip(df['a'], df['b']):
    list1.append((r[0], r[1]))
print("zip耗时       :",time.time()-start)

 

Guess you like

Origin www.cnblogs.com/wqbin/p/11775812.html