I am new to Pandas framework and I have searched enough to resolve my issue but did not get much help online.
I have a string column as given below and I want to convert it into separate columns. My problem here is I have tried splitting it but it did not give me the output the way I need.
*-----------------------------------------------------------------------------*
| Total Visitor |
*-----------------------------------------------------------------------------*
| 2x Adult, 1x Adult + Audio Guide |
| 2x Adult, 2x Youth, 1x Children |
| 5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*
here is the code I used to split my string but did not give me expected output.
df = data["Total Visitor"].str.split(",", n = 1, expand = True)
My Expected Output should be as following table after splitting the string:
*----------------------------------------------------------------------------------------------------------------*
| Adult | Adult + Audio Guide | Youth | Children | Children + AG | Senior + AG
*----------------------------------------------------------------------------------------------------------------*
| 2x Adult | 1x Adult + Audio Guide | - | - | - | -
|
| 2x Adult | - |2x Youth | 1x Children | - | -
| - | 5x Adult + Audio Guide | - | - |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*
How can I do this? Any help or guidance would be great.
Idea is create list of dictionaries with keys of removed numbers with x
by regex
- ^\d+x\s+
(^
is start of string, \d+
is one or more integers and \s+
is one or more whitespaces) and pass to DataFrame
constructor:
import re
L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]
df = pd.DataFrame(L).fillna('-')
print (df)
Adult Adult + Audio Guide Youth Children \
0 2x Adult 1x Adult + Audio Guide - -
1 2x Adult - 2x Youth 1x Children
2 - 5x Adult + Audio Guide - -
Children + Audio Guide Senior + Audio Guide
0 - -
1 - -
2 1x Children + Audio Guide 1x Senior + Audio Guide
Another similar idea is split by x
for columns names from keys of dicts:
L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]
df = pd.DataFrame(L).fillna('-')