Analysis of Google App Store APP

Google Play Store, Google Play Store, includes applications, videos, music, e-books and other digital products. This article uses the data analysis of the Google App Store app to understand what kind of app is more popular. The data used in this article is downloaded from kaggle, the download address is:

1. Ask questions

1. The proportion of various applications in the Google App Store, which category has the largest number of applications?

2. What are the main categories of apps with high ratings?

3. Which type of app has the most reviews?

4. Which type of application has the largest number of installations?

5. What kind of groups is the app applicable to?

6. Will the factor of paying or not affect the score?

7. Will the size of the application affect the number of installations of the application?

2. Understand the data

There are 13 fields in the form, and the meaning of each field is as follows:

App: the name of the application

Category: The category to which the application belongs

Rating: user rating

Reviews: the number of user reviews

Size: Application size

Installs: installation volume

Type: application paid or free

Price: app price

Content Rating: Applicable people

Genres: category

Last Updated: The date of the latest update

Current Ver: the latest version

Android Ver: Applicable Android version

3. Data cleaning

1. The data set includes a total of 10842*13 columns

2. Rename the list. For the convenience of understanding, change the table header to Chinese, as shown in the figure below:

change column names

3. Remove duplicate values

In the data set googleplaystore, the first column "application name" is a unique identifier, and the duplicates are deleted as follows:

remove duplicates

4. Missing value processing

Locate the null value by searching for each column, and find the null value in line 9281, which is a "category" null value, causing the following columns to be misplaced.

There are 4 processing methods for deleting missing values:

Manual completion (when there are few missing values)

Remove items with missing values

Replace missing values ​​with mean

Replace missing values ​​with values ​​calculated by the statistical model

Because there is only one empty value, manual completion is selected here, and the type can be clearly identified by the application name as PHOTOGRAPHY, and the category it belongs to is photography.

before modification

after modification

5. Consistent processing

5.1 The units of the "size" column are k, M and Varies with device, and all unified units are M and the unit M is omitted to facilitate subsequent calculations. Varies with device is replaced by the average value of 20.42 (M), as shown in the figure below:

M

k

Varies with device

5.2 Remove the + in the "Installation Volume" column to facilitate subsequent data pivoting and statistical analysis, as shown in the figure below:

Remove the "+"

Change the installation unit to ten thousand.

5.3 Free and Paid in "Whether to pay" are represented by 0 and 1 respectively, in order to facilitate subsequent data statistics and visualization

Replace Free with 0

Data cleaned up

4. Building models and data visualization

1. The proportion of the main categories of applications

Apply by getting the UniqueId column, and the "Category" column pivot, then plot the column chart:

top ten

It can be seen from the above table that the family category accounts for the largest proportion of applications in the Google App Store, accounting for 18.95%, followed by game and tools applications, and the three categories of applications account for 37.43%.

2. What are the main categories of applications with high scores (above 4 points)?

Use the IF function to divide the "Score" column into intervals 1-2, 2-3, 3-4, 4-5, and 5

rating scale

Score ratio

It can be seen that about 65% of the applications have high scores (above 4 points), and the overall quality is relatively good. Here, FALSE means no score. Then further analyze which type of application is more than 4 points, and extract the data for data perspective as follows:

Proportion of applications with a score of 4 or above

App categories with a score of 4 or above have a similar trend to the ratio of all app categories above. Since there are more family apps in the Google Play Store, there will be more family apps in apps with higher ratings.

3. The distribution of the number of reviews in each category of applications and which section has the most ratings and reviews.

Proportion of App Reviews by Type

Top 10 Apps with Reviews

It can be seen that the applications with more comments are mainly games, social, tool, family and photography applications, and the applications with the highest number of comments are mainly social and games that are well known to the public, such as Facebook, ins, tribal Conflicts, etc., it can be seen that the popularity is directly proportional to the number of comments.

Analyzing the number of comments through the perspective chart is mainly concentrated in the range of ratings and 4-5 points:

4. Which type of application has the largest installation volume

Unit: ten thousand

Apps with 1 billion installs

The application category with the largest number of installations is games, followed by tools and social applications. Most of the applications with 1 billion installations are rated at 4-5 points.

5. What kind of groups is the app applicable to?

Applicable population proportion

Most of the audience of the app is for all users, and there are few apps with age restrictions.

6. The proportion of paid applications and whether the application is paid or not has anything to do with ratings

Proportion of Paid Apps

Whether the application is paid or not and the relationship between ratings

The proportion of paid applications is 7.81%, and the proportion of paid and free applications with more than 4 points is similar, indicating that whether the application is paid or not has little relationship with the rating.

7. Will the size of the application affect the number of installations of the application?

It can be seen from the above figure that the application size is mainly below 20M, and there is no obvious relationship between the application size and the installation volume.

6. Summary

1. In the Google App Store, the family category accounts for the largest number of applications, accounting for 18.95%, followed by game and tools applications, and the three types of applications account for 37.43%.

2. 65% of apps scored more than 4 points, and the application categories with 4 points or above accounted for a similar proportion to all the above application categories.

3. The ones with a large number of reviews are mainly games, social networking, etc., and mainly some apps with high popularity and high ratings. The number of reviews with a rating of 4-5 points is the largest.

4. The application category with the largest number of installations is games, followed by tools and social applications. Most of the applications with 1 billion installations are rated at 4-5 points.

5. Most of the audience of the application is for all users, and there are few applications with age restrictions.

6. Whether the app is paid or not has little to do with ratings.

7. There is no obvious relationship between application size and installation volume.

Guess you like

Origin blog.csdn.net/TuTu6169/article/details/129755094