Because Pandas index is more complex, often easy to confuse the use of the process, so organize a search, sorting on the index, to summarize the document heavy.
import pandas as pd
import numpy as np
#定义DataFrame
dict={'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]}
df=pd.DataFrame(dict,index=['one','two','three'])
df
a | b | c | |
---|---|---|---|
one | 1 | 4 | 7 |
two | 2 | 5 | 8 |
three | 3 | 6 | 9 |
1.Series look through the index
Available Index, or with a numeric subscript
s1=df['b']
s1['two']
s1[['two','one']] # 用数组列出离散的标签,要用[ ]括起来
s1['two':'three'] # 标签切片
s1[0:2] # 标号切片的右区间是开的
2.DataFrame look through the index
(1) directly by taking a column index of the column
df['b']
df[['b','c']] # 用数组列出离散的标签,要用[ ]括起来
(2) .loc index data through the label
- The first to write row labels, column labels after writing
- Include a plurality of columns, use an array, with [] enclosed
- Include slices, without using [] enclosed. Note that the label slice right parenthesis is closed
df.loc[['two','one']] #索引多行,行名用数组
df.loc['two':'three'] #索引多行,行名用切片
df.loc[:,['b','a']] #索引某行多列,列名用数组
df.loc[:,'b':'a'] #索引某行多列,列名用切片
(3) .iloc data acquired by the reference numeral
- The first to write a line number, column number to write
- Available single value can also be sliced, Note: right parenthesis numeral is sliced open interval
df.iloc[1:3,1:3]
3. Conversion between the index and column
(1) Columns -> Index
- inplace parameter can define whether to overwrite the original data
df.set_index('a', inplace=True) # inplace=True 会在原变量直接改,没有返回值
df
df1=df.set_index('a', inplace=False) # inplace=False则有返回值(默认),原变量不变
df1
(2) Index -> column
- Index becomes the first row of the column came in DataFrame
df.reset_index(inplace=True)
df
4. Using an index sorting
- By indexing arrangement, in ascending order, does not cover the original data, if the value of the missing on the front
df.sort_index(ascending=True, inplace=False, na_position='first')
5. The index data deduplication
- keep = 'first' or 'last', the option to keep the first or last
df=df[~df.index.duplicated(keep='first')]