[Turn] Pandas study notes (b) selecting data

Original: https: //morvanzhou.github.io/tutorials/data-manipulation/np-pd/3-2-pd-indexing/ have deletion

The following example is based on matrix data will be described based on 6X4

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=['A','B','C','D'])

"""
             A   B   C   D
2013-01-01   0   1   2   3
2013-01-02   4   5   6   7
2013-01-03   8   9  10  11
2013-01-04  12  13  14  15
2013-01-05  16  17  18  19
2013-01-06  20  21  22  23
"""

Simple screening

Use subscript and labeling index

If we want to select the data DataFrame, we describe two ways that they can achieve the same purpose:

print(df['A'])
print(df.A)

"""
2013-01-01     0
2013-01-02     4
2013-01-03     8
2013-01-04    12
2013-01-05    16
2013-01-06    20
Freq: D, Name: A, dtype: int64
"""

Let selected span multiple rows or columns:

print(df[0:3])
 
"""
            A  B   C   D
2013-01-01  0  1   2   3
2013-01-02  4  5   6   7
2013-01-03  8  9  10  11
"""

print(df['20130102':'20130104'])

"""
A   B   C   D
2013-01-02   4   5   6   7
2013-01-03   8   9  10  11
2013-01-04  12  13  14  15
"""

If df [3: 3] will be an empty object. Which selects 2013-01-02the 2013-01-04data between the tags, and including two tags.

Also in the experiment, I tried df['2013-01-04']and df['20130104']will error, the error message is not these two key. So I do further experiments as follows:

df2 = pd.DataFrame([[0,1],[2,3]],index=['a','b'], columns=['b','a'])
print(df2)

"""
   b  a
a  0  1
b  2  3
"""
  • Experiment 1
print(df2.a)
"""
a    1
b    3
Name: a, dtype: int64
"""

print(df2['b'])
"""
a    0
b    2
Name: b, dtype: int64
"""

This way you can see is to obtain column elements.

  • Experiment 2
print(df2['a':])
"""
   b  a
a  0  1
b  2  3
"""

print(df2['b':])
"""
   b  a
b  2  3
"""

We can see the use :of this approach can obtain row elements .

Of course, this use of the label name to specify the scope of the method obviously a lot of trouble, in addition there is a very obvious downside is that if there are two labels are the same, you would not be able to specify the range starting with the label. So we also can be numerically specified range, e.g. in this example df[1:]is equivalent to df['b':]the.

In addition these two methods, there are some differences, it is the last element, if you are using a number, you will not choose to, whereas if the label would choose, look at an example to better understand:

  • Experiment 3
print(df2['a':'b'])
"""
   b  a
a  0  1
b  2  3
"""

print(df2[0:1])
"""
   b  a
a  0  1
"""

Read method described above you may be a bit confusing, so I do not recommend the above index method. You can refer to filter the data follows several ways.

According to the label loc

We can use tags to select the data loc, that is to say in which case you can no longer use the index numbers. Examples of this main line data by selecting a tag name, or by selecting a row or in all rows (: represents all lines) and select one or more columns wherein data. :

print(df.loc['20130102'])
"""
A    4
B    5
C    6
D    7
Name: 2013-01-02 00:00:00, dtype: int64
"""

print(df.loc[:,['A','B']]) 
"""
             A   B
2013-01-01   0   1
2013-01-02   4   5
2013-01-03   8   9
2013-01-04  12  13
2013-01-05  16  17
2013-01-06  20  21
"""

print(df.loc['20130102',['A','B']])
"""
A    4
B    5
Name: 2013-01-02 00:00:00, dtype: int64
"""

According to the sequence iloc

In addition, we can use selection position : ilocin the position where we can select the data required in each case, for example, selected from any one, or selected from the interbank selected continuous operation.

print(df.iloc[3,1])
# 13

print(df.iloc[3:5,1:3])
"""
             B   C
2013-01-04  13  14
2013-01-05  17  18
"""

print(df.iloc[[1,3,5],1:3])
"""
             B   C
2013-01-02   5   6
2013-01-04  13  14
2013-01-06  21  22

"""

Here we can select the data required in each case by the position, for example, selected from any one, or selected from the interbank selected continuous operation.

ix: Combining locandiloc

Of course, we can choose mixed ix, wherein selecting 'A' and 'C' of the two, and three rows of selected data.

print(df.ix[:3,['A','C']])
"""
            A   C
2013-01-01  0   2
2013-01-02  4   6
2013-01-03  8  10
"""

By judgment of screening

Finally, we can choose to use a determination instruction (Boolean indexing). We can select the constraint and a condition of all current data.

print(df[df.A>8])
"""
             A   B   C   D
2013-01-04  12  13  14  15
2013-01-05  16  17  18  19
2013-01-06  20  21  22  23
"""


MARSGGBO Original


If interested, welcome private stamp

E-mail: [email protected]


2019-10-30 11:06:08



Guess you like

Origin www.cnblogs.com/marsggbo/p/11764013.html