Split a column of data into multiple columns and clean up redundant symbols
process a column of data
Let's take a look at the data format first. The general form is as follows
. As shown above, the data has only one column, and the data contains square brackets and single quotation marks. We want to remove them and split them into multiple lines.
data processing
1. First of all, we need to know that each row is a column of pandas.core.series.Series, and it is a whole. If we want to replace the square brackets and each single quotation mark, we must use df[0] instead of df. Get the data out of it. The specific code is as follows. We use a loop to take out each row and use the replace function to replace it. And replace the processed data with the original data
for i in range(len(data)):
temp = data.loc[i]
// 以下为替换左右括号以及单引号代码,如果有更简洁的代码请留言
temp = temp[0].replace("'", "")
temp = temp.replace("[", "")
temp = temp.replace("]", "")
// 将处理好的数据替换掉原来的数据
data.loc[i] = temp
The result after processing is as follows
2. Then use split() to separate the data from commas and give the column names A, B, and C. . .
data[0] indicates that the data is split in column 0
df = data[0].str.split(',', expand=True).rename(columns={
0:'A', 1:'B', 2:'C', 3:'D',4:'E'})