pandas excel data analysis to generate output percentage data stored in the form of text, how to deal with?

Keywords :

python, pandas, to_excel, text data stored

Description of Requirement :

I use python pandas write a script data statistics and analysis, and the results of using pandas to_excel()into the excelform submitted to the team. But I encountered a problem: When my boss and colleagues to open excela file, found that the percentage of the value not display properly, suggesting that "data stored as text."

image

Such percentage value you want to display properly, how can I do it?

image

Solutions :

1, we must look for solutions from himself. At work, when we need to give the team access to the output document, the document must themselves be responsible for the quality, rather than the requirements or expectations of my boss and colleagues to deal with.

2, effective immediately, easy to use awkward.

Manually open the excel file, select the "text data stored in the form" of a data, click on the "data - breakdown" in the pop-up menu, click twice "Next time," then click "Finish". Each operation can only select a data if multiple columns of data, respectively, as many times as necessary. No way to be lazy.

image

This method looks a bit clumsy, but in an emergency, you can use immediately, immediately solve the problem.

If a single file, such "data stored as text of" more, or you need frequent type of output file, then of course better practice: Optimizing scripts directly address the problem at its roots.

Solution :

0, the initial script

In order to complete this study notes, I build the smallest situations such situations some data, write a small script, as follows:


import pandas as pd

#构建一组数据
df = pd.DataFrame([['文章阅读量', 982000], 
                   ['查看原文访问详情页', 8912], 
                   [ '翻到详情页底部', 4514], 
                   [ '点击购买', 1207], 
                   ['支付成功', 124]],
                   columns=['action','count'])

# 根据数据计算绝对转化率、相对转化率
df['abs_rate'] = df['count']/df['count'].values[0]
df['opp_rate'] = (df['count'].shift(axis=0,periods=-1))/df['count']
df = df.fillna(0)

# 设置百分比数据的显示
df['abs_rate'] = df['abs_rate'].apply(lambda x:format(x, '.2%'))
df['opp_rate'] = df['opp_rate'].apply(lambda x:format(x, '.2%'))

df.to_excel('result.xlsx', index=False)

1, a single sub-table, instead to_csv () method

If only one table, you can no longer use to_excel()but the switch to_csv(). Specific code as follows:


df.to_csv('result.csv',encoding='utf_8_sig',sep=',',index=False)

Two key parameters inside, explain:

  • encoding='utf_8_sig'Instead of the default utf-8is to solve the Chinese garbled;
  • index=FalseIt is not written dataframe data type of indexthat column meaningless data.

But the reality is, the output data statistical analysis, usually composed of multiple sub-tables, Or nearly so back to_excel()acridine!

2, multiple sub-tables, know what to flawless, make a choice

I found a very large web page, the method of direct solution to the problem has not been found. In this case, I can only choose from the following two results in a:

  • It is shown as a percentage, abnormal prompted to open the excel spreadsheet: stored as text data (ie status quo)
  • Displayed as decimal, when you open the excel spreadsheet without exception in

They want to display as a decimal, directly commented script 2 percentage formatting to the statement.


#df['abs_rate'] = df['abs_rate'].apply(lambda x:format(x, '.2%'))
#df['opp_rate'] = df['opp_rate'].apply(lambda x:format(x, '.2%'))

Really not willing acridine! ! I hope one day to find the answer, update this article! At this point we note first put pen to paper!

btw, you have a solution? When you need to export data to excel and dataframe multiple child tables, how can the percentage of normal, without any exception in it?

Guess you like

Origin www.cnblogs.com/jjliu/p/11499120.html