After the advent of the digital era, the various stages of the business can be recorded all aspects of product sales also recorded, customer behavior and online behavior are collected down. Enterprises have a multi-dimensional data, including sales data, customer consumption data, customer behavior data, operations data. Once you have the data, data analysis possible. A typical data analysis, such as Wal-Mart case of beer and diapers, tarts and flashlights, Target judgment 16-year-old teenage pregnancies are a manifestation of this relationship.
1. Data analysis value
One is to enhance efficiency, to help companies improve data processing efficiency, reduce data storage costs.
Another is to provide guidance to businesses, such as precision marketing, fraud, risk management and business improvement.
2. Team and Role
Data analysis team should belong to the independent sector to provide services to all business sectors, with independent technical team, we can set up separate large data computation and analysis platform, were analyzed using the latest data processing techniques to build the model.
In addition the data analysis team should come from the business sector, business data with a high degree of sensitivity, can be broken down as business needs demand data, the data business scene with the scene, and data analysis together.
DBA: providing raw data processed for the data scientists and data analysts, these data are the basis of data analysis and modeling
Business expert: data modeling from business experience and business knowledge, it is a professional business expert analysis to find the business law in order to find the direction of modeling, and modeling gives recommendations and explanations.
Data scientists: to use their professional skills to help business experts and analysts to model data and calculations.
Data Analyst: propose recommendations based on the results and analysis of data, complete the data from the original application to commercialize a key step to
Operations Specialist: to achieve business decisions. By the planned operational activities, the results of data analysis applications into the actual business activities.
The preparatory work before 3. Data analysis
Data Source Selection
Select data sampling
Data type selection
Missing values
Outlier detection and treatment
Data Standardization
The crude classification data (Categorization) Processing
Variable selection
4. The method of evaluating the data model
(1) AUC value discrimination method
AUC = 1, is the perfect classifier.
AUC = [0.85, 0.95], good results
AUC = [0.7, 0.85], the general effect
AUC = [0.5, 0.7], the effect is low, but the forecast for stocks has been very good
AUC = 0.5, like random guessing (Example: throw copper), model no predictive value.
AUC <0.5, worse than random guessing; but as long as the line is always counter-prediction, it is better than random guessing.
(2) KS discriminance
KS value greater than 0.2 would represent a good predictability
Reference documents: