E-commerce transaction data analysis

1. Purpose of analysis: Conduct data analysis based on past e-commerce transaction data to discover laws and problems to guide business

2. Data

Import library

 Import Data

 After loading the data, the first step is to use the describe and info methods to see the approximate distribution of the data

 Load device_type

 

3. Data cleaning

orderId

orderId is the only value in a system

Let's see if there are duplicate values

If there are duplicate values, it is generally processed last, because other columns may affect which duplicate records are deleted

Process the other columns first

userId

The userId only needs to see whether the value is in the normal range from the above describe and info

For order data, a user may have multiple orders, and duplicate values ​​are reasonable

productId

The minimum value of productId is 0, first look at the number of records with a value of 0

177 records, the number is not large, it may be caused by the goods being put on the shelves

cityId

cityId is similar to userId, the values ​​are in the normal range, no need to deal with

price has no null value, and all are greater than 0, pay attention to the unit is a minute, turn it into yuan

payMoney

payMoney has a negative value, and placing an order cannot be a negative value, so here the record for negative values ​​should be deleted

Delete records with negative values

 Units become yuan

channelId

channelId According to the result of info, some null data may be short bugs and other reasons, the channelId field was not passed when placing the order

When the amount of data is large, deleting a small number of null records will not affect the statistical results. Delete directly here

The value of deviceType can be seen in the device_type.txt file, no problem, no need to deal with

Neither createTime nor payTime are null, but we need to count the 2016 data, so we must delete the non-2016

Go back and delete the duplicate records of orderId

Delete the productId is 0

After data cleaning, start analysis

 

4. Data processing and analysis

First look at the overall situation of the data

The total number of orders, total order users, total sales, the number of products with turnover

The analysis of data can be considered from two aspects, one is the dimension and the other is the indicator. The dimension can be regarded as the x-axis, the indicator can be regarded as the y-axis, one dimension can be used to analyze multiple indicators, and the same dimension

Dimensionality reduction

By productId

Let's take a look at the top ten and last ten of product sales

Sales

Look at the intersection of sales and sales of the last 100, if sales and sales are not good, these products need to see if they want to optimize or remove

price

For the price, you can look at the distribution of the prices of all commodities, so that you can know what prices of commodities sell best

 

There are no commodities in many price ranges. If you have data from competitors, you can see if you need to fill in the commodities.

Corresponding price

Order time analysis

The order quantity distribution by hour can be promoted by time

There are more orders at 12, 13 and 14 noon, which should be during the lunch break, and then around 20 pm

According to the week, the most orders are placed on Saturday, followed by Friday and Sunday

How long after the order is paid

Most of the payment is completed within ten minutes, indicating that users rarely hesitate and the purpose of purchase is very strong

Monthly turnover

 

Guess you like

Origin www.cnblogs.com/daisyxxx/p/12683760.html