pandas loc, iloc, ix, at, iat usage

Project github address: bitcarmanlee easy-algorithm-interview-and-practice
welcome everyone to star, leave a message, and learn and progress together

1.loc usage

loc is based on the index of the row, you can select a specific row, and you can also select a specified column based on the column name.
iloc is based on the row/column position (position) to select

def select_test():
    a = [i for i in range(10)]
    b = [2*x + 0.1 for x in a]
    data = {"x": a, "y": b}
    tmp = pd.DataFrame(data, index=["r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10"])
    print(tmp.index)
    print(tmp.columns)
    print()
    return tmp

The output of the method is

Index(['r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7', 'r8', 'r9', 'r10'], dtype='object')
Index(['x', 'y'], dtype='object')

Use the loc method to select the first row

tmp = select_test()
print(tmp.loc["r1"])

Output result

x    0.0
y    0.1
Name: r1, dtype: float64

Use the loc method to select the first three rows, and only select the x column:

print(tmp.loc[["r1", "r2", "r3"], "x"])
r1    0
r2    1
r3    2
Name: x, dtype: int64

If you use the loc[1] method, an error will be reported

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

2.iloc usage

Select the first five lines

print(tmp.iloc[0:5])
    x    y
r1  0  0.1
r2  1  2.1
r3  2  4.1
r4  3  6.1
r5  4  8.1

Select the second column of the first five rows (the index of the first column is 0):

print(tmp.iloc[0:5, 1:2])
      y
r1  0.1
r2  2.1
r3  4.1
r4  6.1
r5  8.1
print(tmp.iloc[0:5, "x"])

The above line of code will report an error:

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

The reason is simple, iloc can only use the starting position of the ranks, not the rank names.

3.ix

In the old version, ix is ​​a hybrid of loc and iloc, which supports both location selection and column name selection.
In the new version, this method has been deprecated. Personally, I think it should be abandoned. The inevitable consequence of API being too flexible is that the code is not standardized and the readability is poor.

4. Index quick selection

There is also a way to quickly select rows/columns

print(tmp[0:5])
print(tmp[['x', 'y']])
    x    y
r1  0  0.1
r2  1  2.1
r3  2  4.1
r4  3  6.1
r5  4  8.1
     x     y
r1   0   0.1
r2   1   2.1
r3   2   4.1
r4   3   6.1
r5   4   8.1
r6   5  10.1
r7   6  12.1
r8   7  14.1
r9   8  16.1
r10  9  18.1

Among them, the first line of code selects the first 5 rows of data, and the second line of code selects the x and y columns of data.

5.at/iat method

at can quickly select an element in the dataframe according to the row index and column name:

print(tmp.at["r3", "x"])

Output is

2

If you use the following code, an error will be reported

print(tmp.at[3, "x"])
ValueError: At based indexing on an non-integer index can only have non-integer indexers

Similar to iloc, iat also locates elements by position

print(tmp.iat[0, 0])

Output is

0

Guess you like

Origin blog.csdn.net/bitcarmanlee/article/details/110878442