Pandas Dataframe - Split string into multiple columns

zealous :

I am new to Pandas framework and I have searched enough to resolve my issue but did not get much help online.

I have a string column as given below and I want to convert it into separate columns. My problem here is I have tried splitting it but it did not give me the output the way I need.

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*

here is the code I used to split my string but did not give me expected output.

df = data["Total Visitor"].str.split(",", n = 1, expand = True)

My Expected Output should be as following table after splitting the string:

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*

How can I do this? Any help or guidance would be great.

jezrael :

Idea is create list of dictionaries with keys of removed numbers with x by regex - ^\d+x\s+ (^ is start of string, \d+ is one or more integers and \s+ is one or more whitespaces) and pass to DataFrame constructor:

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  

Another similar idea is split by x for columns names from keys of dicts:

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=293832&siteId=1