Pandas | 08 index rebuild

Re-indexing changes DataFrame row and column labels.

You may be achieved by indexing the plurality of operation:

  • Reordering existing data to match a new set of tags.
  • Insert the missing value (NA) not labeled tag data tag location.

 

import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   ' D ' : np.random.normal (100, 10, size = . (N)) ToList () 
}) 
Print (DF)
 Print ( ' \ n- ' ) 

# REINDEX The DataFrame 
df_reindexed = df.reindex (index = [ 0,2,5], columns = [ 'a ', 'C', 'B']) # conform to extract the
 Print (df_reindexed)

Output:

 

            A     x         y       C           D
0 2016-01-01 0.0 0.910736 Low 105.308796
1 2016-01-02 1.0 0.570500 Low 91.024238
2 2016-01-03 2.0 0.930298 High 112.359308
3 2016-01-04 3.0 0.251355 Medium 106.155192
4 2016-01-05 4.0 0.579235 Low 90.079651
5 2016-01-06 5.0 0.623852 High 110.592218
6 2016-01-07 6.0 0.621130 Medium 96.222673
7 2016-01-08 7.0 0.989647 Medium 92.253444
8 2016-01-09 8.0 0.506653 Medium 102.601417
9 2016-01-10 9.0 0.099482 Low 97.721659
10 2016-01-11 10.0 0.254750 Medium 75.502131
11 2016-01-12 11.0 0.543014 Medium 88.895951
12 2016-01-13 12.0 0.911283 Medium 79.526056
13 2016-01-14 13.0 0.255296 Low 92.248119
14 2016-01-15 14.0 0.205302 Low 103.301747
15 2016-01-16 15.0 0.246407 Low 107.158250
16 2016-01-17 16.0 0.202039 High 96.411279
17 2016-01-18 17.0 0.734529 High 88.177103
18 2016-01-19 18.0 0.275703 Medium 82.885365
19 2016-01-20 19.0 0.084449 High 98.803349


A C B
0 2016-01-01 Low NaN
2 2016-01-03 High NaN
5 2016-01-06 High NaN
 

Rebuild the index align with other objects

You may wish to take an object and re-indexed, which axis is labeled the same as another object. Consider the following example to understand this.

 

Import PANDAS AS PD
 Import numpy AS NP 

DF1 = pd.DataFrame (np.random.randn (10,3), Columns = [ ' col1 ' , ' col2 ' , ' col3 ' ]) 
DF2 = pd.DataFrame (np.random .randn (7,3), Columns = [ ' col1 ' , ' col2 ' , ' col3 ' ])
 Print (df1)
 Print (df2) 

df1 = df1.reindex_like (df2)                  # in the df1, df2 and the like tag line extracted 
Print (DF1)

Output:

 

       col1      col2      col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250
7 0.230918 -1.561643 1.501326
8 0.703623 -0.407445 -0.792633
9 0.340817 -1.132127 -0.695821

col1 col2 col3
0 0.144380 0.295776 -0.743097
1 -1.597853 0.029949 -1.605222
2 0.626728 -0.077997 -0.167353
3 0.466008 0.695279 -0.047752
4 -1.088821 -0.456605 1.192847
5 -0.020330 1.616297 -0.368196
6 -1.038790 -1.264894 0.059060

col1 col2 col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250

Note - Here, df1a data frame ( DataFrame ) is changed and renumbered as df2. Column names should match, otherwise will add an entire column labels NAN.

Refill filling

reindex()Optional parameter using the method, which is a method of filling, which values ​​are as follows:

  • pad/ffill - forward fill value
  • bfill/backfill - back padding value
  • nearest - Fill from the nearest index value
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print (df2.reindex_like(df1,method='ffill'))

Output:

         col1        col2       col3
0    1.311620   -0.707176   0.599863
1   -0.423455   -0.700265   1.133371
2         NaN         NaN        NaN
3         NaN         NaN        NaN
4         NaN         NaN        NaN
5         NaN         NaN        NaN

Data Frame with Forward Fill:
         col1        col2        col3
0    1.311620   -0.707176    0.599863
1   -0.423455   -0.700265    1.133371
2   -0.423455   -0.700265    1.133371
3   -0.423455   -0.700265    1.133371
4   -0.423455   -0.700265    1.133371
5   -0.423455   -0.700265    1.133371

Note - The last four lines are filled.

Filling limit of rebuilding the index

Limit parameters provide additional control of the filling at the time to rebuild the index. Limit specifies the maximum count of consecutive matches.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))

Output:

         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2         NaN         NaN         NaN
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN

Data Frame with Forward Fill limiting to 1:
         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2   -0.055713   -0.021732   -0.174577
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN
 

Note - Only the first 7row by the former 6filling line. Then, the other lines as is retained.

Heavy naming

rename()Based on some mapping method allows (or dictionary series), or any function to relabel a shaft.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(df1)
print('\n')

print ("After renaming the rows and columns:")
print(df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))

输出结果:

         col1        col2        col3
0    0.486791    0.105759    1.540122
1   -0.990237    1.007885   -0.217896
2   -0.483855   -1.645027   -1.194113
3   -0.122316    0.566277   -0.366028
4   -0.231524   -0.721172   -0.112007
5    0.438810    0.000225    0.435479

After renaming the rows and columns:
                c1          c2        col3
apple     0.486791    0.105759    1.540122
banana   -0.990237    1.007885   -0.217896
durian   -0.483855   -1.645027   -1.194113
3        -0.122316    0.566277   -0.366028
4        -0.231524   -0.721172   -0.112007
5         0.438810    0.000225    0.435479
 

rename()方法提供了一个inplace命名参数,默认为False并复制底层数据。 指定传递inplace = True则表示将数据重命名。

Guess you like

Origin www.cnblogs.com/Summer-skr--blog/p/11704722.html