Split a column of data into multiple columns and clean up redundant symbols

Split a column of data into multiple columns and clean up redundant symbols

process a column of data

Let's take a look at the data format first. The general form is as follows
insert image description here
. As shown above, the data has only one column, and the data contains square brackets and single quotation marks. We want to remove them and split them into multiple lines.

data processing

1. First of all, we need to know that each row is a column of pandas.core.series.Series, and it is a whole. If we want to replace the square brackets and each single quotation mark, we must use df[0] instead of df. Get the data out of it. The specific code is as follows. We use a loop to take out each row and use the replace function to replace it. And replace the processed data with the original data

for i in range(len(data)):
    temp = data.loc[i]
    // 以下为替换左右括号以及单引号代码,如果有更简洁的代码请留言
    temp = temp[0].replace("'", "")
    temp = temp.replace("[", "")
    temp = temp.replace("]", "")
    // 将处理好的数据替换掉原来的数据
    data.loc[i] = temp

The result after processing is as follows
insert image description here

2. Then use split() to separate the data from commas and give the column names A, B, and C. . .
data[0] indicates that the data is split in column 0

df = data[0].str.split(',', expand=True).rename(columns={
    
    0:'A', 1:'B', 2:'C', 3:'D',4:'E'})

Guess you like

Origin blog.csdn.net/weixin_40061485/article/details/125663779