Reading Notes - Data Analysis

Lu Hui - Excerpts from Chapter 6 of "Data Mining and Data Operation Practice"

Complete case of data mining project

1 Project background and business analysis requirements

Background: The main work of the "free member operation team" of an Internet company is to continuously cultivate and improve the maturity of free members and e-commerce professionalism, so that some high-quality free members can be upgraded to paid members in a timely manner when conditions are met. Free members can be divided into three groups according to their activity: high activity, medium activity and low activity.

The indicators of activity division are mainly the number of times of logging in to the website within 30 days, and the PV volume of a core entry in the past 30 days.

Data Analysis Business Requirements
Highly active free users have always been the key customer group of the operation team, and the paid conversion rate of the highly active group has always been
the highest, and the number of conversions is also the largest. However, an important problem that troubles the operator is that the churn rate of high-activity users is relatively large, and a considerable proportion of high-activity free users will drop from high-activity to medium- and low-activity groups in a short period of time. Use data analysis methods to target highly active users who are most likely to churn in advance .

2 Data analysts participate in requirements discussion

The discussion mainly includes:
1. Collect relevant background data and indicators for the needs, and familiarize with the relevant business logic in the background with the business side;
2. From the professional perspective of data analysis, evaluate whether the preliminary business analysis requirements are reasonable and feasible;

3 Develop a needs analysis framework and analysis plan

1. The analysis requirements are transformed into the definition of target variables in the data analysis project. Definition of Churn of Highly Active Users
2. A general description of the analysis idea. Specific to this case, the analysis idea is to more accurately and effectively target user groups that may
be
3. Analyze the data extraction rules of the samples.
4. Rough delineation and listing of latent analysis variables (model input variables). (The relevant variables that seem to be meaningful for the prediction of the target variable are roughly listed from the perspective of business experience.)
5. Project risk thinking and main coping strategies during the analysis process.
6. Analysis and prospect of the application value of the project.
It mainly focuses on 3 aspects:
(1) After the model is put into application, it locks in the highly active user group with high risk of loss in advance, so that the operator can carry out the operation work such as retention and service in a targeted manner;
(2) The modeling process The valuable and most likely important fields and indicators found in the churn are selectively provided to the operator for the basis and reference for formulating operation plans and strategies;
(3) For the core indicators and fields that affect the churn, we can provide To relevant business parties as a basis
and

4 Extracting sample data, familiarizing with data, data cleaning and thorough investigation


According to the analysis ideas and modeling ideas discussed in the previous stage, as well as the initially delineated analysis fields (analysis variables), write code to extract the sample data required for analysis and modeling from the data warehouse; Data, dirty data, wrong data, etc., and these obvious data quality problems in the sample data are cleaned, eliminated, and transformed. At the same time, depending on the specific business scenarios and project requirements, it is determined whether to generate derivative variables, and how to derive them.
Data cleaning:
(1) Handling of missing values ​​(discarding, filling);
(2) Correlation analysis between input variables to find out the relevant input variables of potential collinearity problems, and keep only one variable with high linear correlation;
(3) The data rollback process of the data warehouse has caused serious illogicality or obvious self-contradiction in some fields.

5 Preliminary construction of the mining model as planned

Main contents:
(1) Further filter the input variables of the model. The input variables that finally enter the model should follow the general principle of "less but better";
(2) try different mining algorithms and analysis methods, and compare the effects, efficiency and stability of different schemes;
(3) sort out the data selected by the model A series of core input variables that are most relevant to the prediction of the target variable, and use them as references and suggestions when discussing the application with the
business side.

6 Discuss the preliminary conclusions of the model with the business side, and propose new ideas and model optimization plans

At this stage, it is necessary to sort out the preliminary reports and conclusions of the model, refine the main prediction fields,
sort out and refine the core independent variables, and sort the weights.
It is also necessary to communicate and share with the business side to discuss on this basis. The possible optimization direction of the model is proposed, and the solution for landing application is discussed, and the precautions are listed at the same time.

7 Re-sample and model according to the optimization plan, refine the conclusions and verify the model

On the basis of the above optimization scheme and the newly added derivative variables, re-extract the samples, on the one hand, to verify the previous important conjectures; on the other hand, try to build a new model to improve the prediction effect.
After the model is built, it cannot be immediately submitted to the business side for application, and the latest actual data must be used to verify
the .

8 Complete the analysis report and implement application suggestions

A detailed and complete project conclusion and application suggestions:
(1) The prediction effect and efficiency of the model, and the results of verifying the model in the latest actual data, that is, the stability of the model.
(2) The important independent variables and corresponding characteristics and laws that can be used as operational reference are sorted out through the model.
(3) Layered recommendations for landing applications put forward by data analysts based on model effect and efficiency data, and corresponding operational recommendations,
including: Stratification of customer characteristics based on the prediction model scoring application, and segmentation of operating channels for corresponding sub-groups. Selection, the theme or gimmick of the operation copy, the direction and purpose of the operation guidance, the setting of the control group and the operation group, the plan for effect monitoring, etc.

9 Formulate specific landing application plans and evaluation plans
10 The business side implements the landing application plans and tracks and
evaluates the effects
11 After the actual effect evaluation of the landing application plans,
constantly revise and improve
12 Evaluation, summary and feedback of different operation plans
13 Project application Post-summary and reflection

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325745284&siteId=291194637