Selecting dataframe rows based on multiple columns, where new functions should be created to handle conditions in some columns

A. Haidar :

I have a dataframe that consists of multiple columns. I want to select rows based on conditions in multiple columns. Assuming that I have four columns in a dataframe:

import pandas as pd
di={"A":[1,2,3,4,5],
    "B":['Tokyo','Madrid','Professor','helsinki','Tokyo Oliveira'],
"C":['250','200//250','250//250//200','12','200//300'],
"D":['Left','Right','Left','Right','Right']}
data=pd.DataFrame(di)

I want to select Tokyo in column B, 200 in column C, Left in column D. By that, the first row will be only selected. I have to create a function to handle column C. Since I need to check the first value if the row contains a list with //

To handle this, I assume this can be done through the following:

def check_200(thecolumn):
thelist=[]
for i in thecolumn:
    f=i
    if "//" in f:
        #split based on //
        z=f.split("//")
        f=z[0]

    f=float(f)
    if f > 200.00:
        thelist.append(True)
    else:
        thelist.append(False)
return thelist

Then, I will create the multiple conditions:

selecteddata=data[(data.B.str.contains("Tokyo")) & 
(data.D.str.contains("Left"))&(check_200(data.C))]

Is this the best way to do that, or there is an easier pandas function that can handle such requirements ?

Bruno Mello :

I don't think there is a most pythonic way to do this, but I think this is what you want:

bool_idx = ((data.B.str.contains("Tokyo")) & 
(data.D.str.contains("Left")) & (data.C.str.contains("//")
& (data.C.str.split("//")[0].astype(float)>200.00))

selecteddata=data[bool_idx]

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=379464&siteId=1