Python List to Dataframe with conditions

Si_CPyR :

I have a long list (sample below)

df_list = ['Joe',
 'UK',
 'Buyout',
 '10083',
 '4323',
 'http://info2.com',
 'Linda',
 'US',
 'Liquidate',
 '97656',
 '1223',
 'http://global.com',
 '[email protected]'           
          ]

As you can see, the list contains information about an individual (Joe and Linda's). However, the problem is that for some observations (Joe in this example), I am missing 7th element, which corresponds to the entity's email address, because for Linda, we do have this person's email, thus populated.

I want to turn this list into a dataframe with 7 columns (below), and for observations that do not have a valid email address (does not contain "@"), I want to put Null/empty values, rather than the next element, which would be the next observation's NAME column for email column.

cols = ['NAME'
,'COUNTRY'
,'STRATEGIES'
,'TOTAL FUNDS'
,'ESTIMATED PAYOFF'
,'WEBSITE'
,'EMAIL']

So far, this is where I am at

big_list = []  #intention is to append N (number of unique entity) small_lists into a big_list and call pd.DataFrame(big_list)
small_list = [] #intention is to create a small_list for each observation/entity, containing 7 values, including email or null if empty
for element in df_list:
    small_list.append(element)
if ("@" not in small_list):
    small_list[-1] = None

Any help would be highly appreciated! Thanks

kederrac :

you could use a generator:

def gen_batch(df_list):
    i = 6
    while i <= len(df_list):
        if i < len(df_list) and '@' in df_list[i]:
            yield df_list[i-6: i+1] 
            i += 7
        else:
            yield df_list[i-6: i] + [pd.np.NAN]
            i += 6

pd.DataFrame(gen_batch(df_list), columns=cols)  

output: enter image description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=319885&siteId=1
Recommended