1. Analysis purpose: guide business direction through app data analysis of google play store
2. Data
Import framework
Import Data
This time only analyze 'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type'
Simply browse the data
View the number of ranks
View the amount of non-null data for each column
There are many missing values that need to be cleaned
3. Data cleaning
App processing
Check for duplicate values
If there are duplicate values, do n’t worry about deleting the duplicate values first. In order not to leave the outliers of other columns, deal with the columns with abnormal values first.
Category processing
There is an outlier
delete
Rating processing
Fill with average
There is an exception record with a value of 19, which is the same record as Category's exception
ReviewsCleaning
Use value_counts to see the data distribution is very wide, looks like data
Size cleaning
Convert to floating point
Fill size 0 to the average
Installs cleaning
Less distribution, direct replacement
Convert
Type processing
df.info () sees that there is na value, here need dropna parameter
Delete this data
After data cleaning, start analyzing data
4. Data processing and analysis
Category data
Number of categories
The number of apps in each category, sorting, you can find out which categories of apps are most popular with developers
Sorted installation volume ranking: Entertainment and social categories are most needed by users
Classified comment data: more social game reviews
The classified scoring data is not consistent with other data and needs further analysis
Type data
The proportion of free is large, the proportion of paid is small, and free is still the mainstream
Category and Type analysis together
Comment installation ratio
Relevance: The number of comments is strongly related to the number of installations. Others are not even 0.1, and can be considered irrelevant (more than 0.5 can be considered relevant, and more than 0.3 can be considered weakly relevant)