Summary of Indexing operation in DataFrame of Pandas
For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df
|
one |
two |
three |
four |
Ohio |
0 |
1 |
2 |
3 |
Colorado |
4 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
New York |
12 |
13 |
14 |
15 |
(1) df[val]
- when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df['one']
Ohio 0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int32
- when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[['one','two']]
|
one |
two |
Ohio |
0 |
1 |
Colorado |
4 |
5 |
Utah |
8 |
9 |
New York |
12 |
13 |
- when val is
:num
, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
|
one |
two |
three |
four |
Ohio |
0 |
1 |
2 |
3 |
Colorado |
4 |
5 |
6 |
7 |
df.iloc[:2] # the same with above
|
one |
two |
three |
four |
Ohio |
0 |
1 |
2 |
3 |
Colorado |
4 |
5 |
6 |
7 |
df[1:3]
|
one |
two |
three |
four |
Colorado |
4 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
df.iloc[1:3]
|
one |
two |
three |
four |
Colorado |
4 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
- when val is boolean DataFrame, df[val] sets values based on boolean
df<5
|
one |
two |
three |
four |
Ohio |
True |
True |
True |
True |
Colorado |
True |
False |
False |
False |
Utah |
False |
False |
False |
False |
New York |
False |
False |
False |
False |
df[df<5]
|
one |
two |
three |
four |
Ohio |
0.0 |
1.0 |
2.0 |
3.0 |
Colorado |
4.0 |
NaN |
NaN |
NaN |
Utah |
NaN |
NaN |
NaN |
NaN |
New York |
NaN |
NaN |
NaN |
NaN |
df[df<5]=0;df
|
one |
two |
three |
four |
Ohio |
0 |
0 |
0 |
0 |
Colorado |
0 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
New York |
12 |
13 |
14 |
15 |
(2)df.loc[val]
- when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc['Colorado']
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.loc[['Colorado','New York']]
|
one |
two |
three |
four |
Colorado |
0 |
5 |
6 |
7 |
New York |
12 |
13 |
14 |
15 |
(3)df.loc[:,val]
- when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
|
two |
Ohio |
0 |
Colorado |
5 |
Utah |
9 |
New York |
13 |
df.loc[:,['one','two']]
|
one |
two |
Ohio |
0 |
0 |
Colorado |
0 |
5 |
Utah |
8 |
9 |
New York |
12 |
13 |
df[['one','two']] # The same with above df.loc[:,['one','two']]
|
one |
two |
Ohio |
0 |
0 |
Colorado |
0 |
5 |
Utah |
8 |
9 |
New York |
12 |
13 |
(3)df.loc[val1,val2]
- when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc['Ohio','one']
0
df.loc[['Ohio','Utah'],'one']
Ohio 0
Utah 8
Name: one, dtype: int32
df.loc['Ohio',['one','two']]
one 0
two 0
Name: Ohio, dtype: int32
df.loc[['Ohio','Utah'],['one','two']]
|
one |
two |
Ohio |
0 |
0 |
Utah |
8 |
9 |
df.loc[:,:]
|
one |
two |
three |
four |
Ohio |
0 |
0 |
0 |
0 |
Colorado |
0 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
New York |
12 |
13 |
14 |
15 |
df.loc['Ohio',:]
one 0
two 0
three 0
four 0
Name: Ohio, dtype: int32
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['one','two']]
|
one |
two |
Ohio |
0 |
0 |
Colorado |
0 |
5 |
Utah |
8 |
9 |
New York |
12 |
13 |
(4) df.iloc[val]
- Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
|
one |
two |
three |
four |
Colorado |
0 |
5 |
6 |
7 |
New York |
12 |
13 |
14 |
15 |
(5)df.iloc[:,val]
- The same with df.loc,except that val shall be integer or list of integers.
df
|
one |
two |
three |
four |
Ohio |
0 |
0 |
0 |
0 |
Colorado |
0 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
New York |
12 |
13 |
14 |
15 |
df.iloc[:,1]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.iloc[:,[1,3]]
|
two |
four |
Ohio |
0 |
0 |
Colorado |
5 |
7 |
Utah |
9 |
11 |
New York |
13 |
15 |
(6)df.iloc[val1,val2]
- The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado 6
Utah 10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
|
two |
three |
Colorado |
5 |
6 |
Utah |
9 |
10 |
df.iloc[:,[1,2]]
|
two |
three |
Ohio |
0 |
0 |
Colorado |
5 |
6 |
Utah |
9 |
10 |
New York |
13 |
14 |
df.iloc[[1,2],:]
|
one |
two |
three |
four |
Colorado |
0 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
(7)df.at[val1,val2]
- val1 shall be a single index value,val2 shall be a single column value.
df.at['Utah','one']
8
df.loc['Utah','one'] # The same with above
8
df.at[['Utah','Colorado'],'one'] # Raise exception
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2538 try:
-> 2539 return engine.get_value(series._values, index)
2540 except (TypeError, ValueError):
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[['Utah','Colorado'],'one']
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
2140
2141 key = self._convert_key(key)
-> 2142 return self.obj._get_value(*key, takeable=self._takeable)
2143
2144 def __setitem__(self, key, value):
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2543 # use positional
2544 col = self.columns.get_loc(col)
-> 2545 index = self.index.get_loc(index)
2546 return self._get_value(index, col, takeable=True)
2547 _get_value.__doc__ = get_value.__doc__
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3076 'backfill or nearest lookups')
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
(8) df.iat[val1,val2]
- The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
|
one |
two |
three |
four |
Ohio |
0 |
0 |
0 |
0 |
Colorado |
0 |
5 |
6 |
7 |
Utah |
8 |
9 |
10 |
11 |
New York |
12 |
13 |
14 |
15 |
Conclusion
- val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
- Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
- df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
- df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]