pandas index data search, sort and de-emphasis Summary

Because Pandas index is more complex, often easy to confuse the use of the process, so organize a search, sorting on the index, to summarize the document heavy.

import pandas as pd
import numpy as np

#定义DataFrame
dict={'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]}
df=pd.DataFrame(dict,index=['one','two','three'])
df

	a	b	c
one	1	4	7
two	2	5	8
three	3	6	9

1.Series look through the index

Available Index, or with a numeric subscript

s1=df['b']
s1['two']
s1[['two','one']]  # 用数组列出离散的标签，要用[ ]括起来
s1['two':'three']  # 标签切片
s1[0:2]            # 标号切片的右区间是开的

2.DataFrame look through the index

(1) directly by taking a column index of the column

df['b']
df[['b','c']]  # 用数组列出离散的标签，要用[ ]括起来

(2) .loc index data through the label

The first to write row labels, column labels after writing
Include a plurality of columns, use an array, with [] enclosed
Include slices, without using [] enclosed. Note that the label slice right parenthesis is closed

df.loc[['two','one']]   #索引多行，行名用数组
df.loc['two':'three']   #索引多行，行名用切片
df.loc[:,['b','a']] #索引某行多列，列名用数组
df.loc[:,'b':'a']   #索引某行多列，列名用切片

(3) .iloc data acquired by the reference numeral

The first to write a line number, column number to write
Available single value can also be sliced, Note: right parenthesis numeral is sliced open interval

df.iloc[1:3,1:3]

3. Conversion between the index and column

(1) Columns -> Index

inplace parameter can define whether to overwrite the original data

df.set_index('a', inplace=True)  # inplace=True 会在原变量直接改，没有返回值
df

df1=df.set_index('a', inplace=False)  # inplace=False则有返回值(默认),原变量不变
df1

(2) Index -> column

Index becomes the first row of the column came in DataFrame

df.reset_index(inplace=True) 
df

4. Using an index sorting

By indexing arrangement, in ascending order, does not cover the original data, if the value of the missing on the front

df.sort_index(ascending=True, inplace=False, na_position='first')

5. The index data deduplication

keep = 'first' or 'last', the option to keep the first or last

df=df[~df.index.duplicated(keep='first')]