Transport data collection and analysis + + data cleaning data visualization

1. Requirements:

1.1 Data Collection

First, the data source 11 (Transportation)
air travel because of its fast and convenient, has been like a growing number of people, an airline through
many years of operation, has accumulated a large number of member profiles and take the flight information for the customer grouping clear price
value of the customer base, the limited marketing resources to focus on high-value customers and achieve maximum profit. To
this, the airline hired "H3CU" big data analytics company to complete the project.
Since membership information confidential data belonging to the company, the airline will post the data to a csv file transfer desensitization
to the "H3CU" company for data processing and analysis for security reasons "H3CU" Companies want the data
first stored in the database backup, and then data further washing and analysis. Please refer to the instructions relevant professional finish
to the task.
1, the airline has accumulated a large number of member profile information and take the flight information, including the card number,
GZ-2019032 Big Data technologies and applications (vocational group) game Exam
- 13 -
enrollment period, gender, age, level of membership cards, flying in the number of kilometers of observation windows, flight
between, wherein the number of flights 44 and other attributes, data stored in the csv file format.
2, the most widely used model to identify customer value applications RFM model. Wherein, R (Recency) refers to the
most recent spending time with the time interval of the deadline, usually smaller R value, the customer for goods or services
Service most likely to be interested. F (Frequency) refers to the number of consumer customers a certain period of time, the higher the number,
the greater customer value. M (Monetary) refers to the customer the amount of consumption in a certain period of time.
3, since this task, the same amount of consumption of different customers, the value of the airline are different than
For example, airline passenger who purchases a long, low-grade class with a purchase of short routes, a high level of accommodation
compared to passengers, the fare may be the same, but the value of the latter airline may be higher. So, with
a total travel and accommodation for air travel M corresponding discount factor C instead of the amount of consumption.
4, the airline Membership time also to some extent affect the customer value, thus increasing the customer relationship length L
as another feature. Construction of the model contains 6 features, respectively, and the original data FFP_DATE
(initiation time), load_time (observation window end time), FLIGHT_COUNT (observed
number of flights within the window), AVG_DISCOUNT (average discount factor), SEG_KM_SUM (View
measured total flight window several kilometers), LAST_TO_END (last opportunity to observe the time window junction
GZ-2019032 big data technology and applications (Higher group) race exam
- 14 -
long beam).
The tasks include the following:
1, using Java or Python programming language, given csv format data files written to
Mysql database and run the code and save the screenshot result.
1) Import Module
2) connected to the database
3) Create a table, name table
4) writing data into the database
5) close the database
2, using the data transfer tool, aviation Mysql data from introducing data in the database platform for
data cleansing, and the command and operating results and save screenshots.

1.2 Analysis of data cleaning and

This phase of the mission are: the basic customer information, opportunity information, integrating information and other user information
to clean and organize, and complete data calculation, analysis and data visualization.
Statistical analysis of the airline sample data, using Java or Python language reading target data
extraction, data exploration, data preprocessing, data structure and other characteristics, according to questions asked to the specified file
in.
1, data processing, the number of null data in each column of the extract file, maximum, minimum, and print
the output data, and save the result theme;
2, excluding the price fare (SUM_YR_1, SUM_YR_2) empty record and outputs the revised
number of ranks;
3, to retain the fare (SUM_YR_1, SUM_YR_2) non-zero, the average discount rate
(avg_discount) is not 0 and the total number of flight (SEG_KM_SUM) 0 km larger than the record of
record, modify and print output after the number of the ranks;
4, remove the original data irrelevant attributes, according to customer value, select the relevant questions asked by the six
attributes, and outputting information of the print five lines;
GZ-2019032 big data technology and applications (Higher group ) race exam
- 20 is -
5, the amount of extraction subject specified by the attribute index structure 5;
6, greater need to normalize the data, so that since the range difference between the five indicators
are normalized standard deviation, and output 5 lines of data before printing;
7, the average value of each column of the normalized data, and print out;
8, 20% of the calculated trimmed mean of the normalized data for each column And printout;
9, the normalized data is calculated median of each column, and printout;
10, quantile (Fourth third bits) for each column of the normalized data, and prints;
covariance 11, the normalized data is calculated for each column, and printout;
12, the print display summary statistics for each column of the normalized data.

1.3 Data Visualization:

1, according to the understanding of the meaning of the airline LRFMC model
L: End of Membership time from several months observation window
R: customers flying from the last several months observing the end of the window
F: customers flying during an observation window number
M: customer observation window flyer miles accumulated
C: customers take positions corresponding to the discount factor in the observation window, the average
use visualization tools for a given data results were analyzed to derive visual diagrams related categories.


2, Air membership card is a symbol of membership, to a certain extent reflect the mileage is also a member of the flight, flying
more the mileage, the higher the membership level, customers will also be able to prove the value of the customer airline. Please
according to the specified table data, presented by specifying the legend.

 

2. Implement

Link: https: //pan.baidu.com/s/1aY6K2yay8yPJBATMJ3BRFg 
extraction code: 1uz7 
copy the contents of this open Baidu network disk phone App, the operation more convenient oh

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_40903057/article/details/90598164