App data analysis of google play store

1. Analysis purpose: guide business direction through app data analysis of google play store

 

2. Data

Import framework

Import Data

This time only analyze 'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type'

Simply browse the data

View the number of ranks

View the amount of non-null data for each column

There are many missing values ​​that need to be cleaned

 

3. Data cleaning

App processing

Check for duplicate values

If there are duplicate values, do n’t worry about deleting the duplicate values ​​first. In order not to leave the outliers of other columns, deal with the columns with abnormal values ​​first.

Category processing

There is an outlier

delete

Rating processing

Fill with average

There is an exception record with a value of 19, which is the same record as Category's exception

ReviewsCleaning

Use value_counts to see the data distribution is very wide, looks like data

Size cleaning

Convert to floating point

Fill size 0 to the average

Installs cleaning

Less distribution, direct replacement

Convert

Type processing

df.info () sees that there is na value, here need dropna parameter

 Delete this data

After data cleaning, start analyzing data

 

4. Data processing and analysis

Category data

Number of categories

The number of apps in each category, sorting, you can find out which categories of apps are most popular with developers

Sorted installation volume ranking: Entertainment and social categories are most needed by users

Classified comment data: more social game reviews

The classified scoring data is not consistent with other data and needs further analysis

Type data

The proportion of free is large, the proportion of paid is small, and free is still the mainstream

Category and Type analysis together

Comment installation ratio

Relevance: The number of comments is strongly related to the number of installations. Others are not even 0.1, and can be considered irrelevant (more than 0.5 can be considered relevant, and more than 0.3 can be considered weakly relevant)

 

Guess you like

Origin www.cnblogs.com/daisyxxx/p/12682827.html