choosing rows by values in DataFrame

JJJohn :

A post gives a way to choose rows by column value

Here is a DataFrame

            0           1
0  877.443401  808.520962
1  826.300620  848.761594
2  824.403359  861.395174
3  866.732033  804.494156
4  853.461260  874.307851
5  822.906499  830.102249
6  852.605652  863.602725
7  893.421600  825.032893
8  863.768363  862.298227
9  899.976622  864.111539

with this code df[df.columns[[1]]]>850, I got

    1
0   False
1   False
2   True
3   False
4   True
5   False
6   True
7   False
8   True
9   True

when I run this df.loc[(df[df.columns[[1]]]>850)], I got error

ValueError                                Traceback (most recent call last)
<ipython-input-36-8a159ef0cec2> in <module>()
----> 1 df.loc[(df[df.columns[[1]]]>850)]

this code df[df[df.columns[[1]]]>850] gives

    0   1
0   NaN NaN
1   NaN NaN
2   NaN 861.395174
3   NaN NaN
4   NaN 874.307851
5   NaN NaN
6   NaN 863.602725
7   NaN NaN
8   NaN 862.298227
9   NaN 864.111539

This is close, what I am trying to get is a new DataFrame consists of rows at [2,4,6,8,9].

How to do that? Thanks to anyone who gives some inspiration.

Bishwarup Bhattacharjee :

df['a'] returns a pd.Series while df[['a']] returns a pd.DataFrame with only column being 'a'. For your problem:

Using loc

new_df = df.loc[df[1] > 850].copy()

Using query

new_df = df.query('a > 850')

It's customary using str column names instead of int. For example, the query method would not work with int column names and there are a plethora of weird behaviours you can face with int column names.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=391349&siteId=1