Zhaopin Hbase data analysis data visualization +

demand:

Background described
recent years, with the accelerated development of IT industry, the demand for talent across the country for IT class more and more,
"XHS Group" company in order to clarify the layout of the development of each region, be IT company jobs in several provinces
Investigation analysis. Group your location will assume simulation research and analysis tasks, by recruitment website into
line jobs crawling to get to the company name, place of work, job name, recruitment requirements, recruitment
information number, and through data cleaning and analysis , in the final analysis a number of popular job recruitment, regional
differences in the average salary of each region.
This is a simulated mission, the project team plans to use a distributed node Hadoop mode, set up the environment using
server cluster approach, by recruitment website crawling to get the relevant information, and perform data crawling,
clean, organize, calculate, expression, analysis, and strive to achieve have a clearer grasp of the cities in the IT industry.
As a technical staff of the project team, you are the core members of the technical solutions show, follow
the completion of this technology demonstration mission next step and submit a technical report, I wish you success.
A Task: Hadoop platform and deployment manager component (15 minutes)
Hbase under 1) the specified mounting path specified decompression path;
2) the apache-Hbase-2.0.1-bin file renamed folder after extracting Hbase; Hbase file into the
folder;
3) is provided Hbase environment variables and environment variables takes effect only root user;
4) modify the installation directory Hbase the site.xml-HBase;
. 5) to modify the installation directory Hbase hbase-env.sh;
6) modifying Hbase regionservers installation directory;
7) of the hadoop hdfs-site.xml and into the core-site.xml HBase / the conf;
. 8) and saving start command Hbase output.
Task II data acquisition (15 points)
grab the key from the mainstream recruitment website: "Company Name", "city work", "work to

Seek, "" number of recruits, "wages" (format: 'basic salary - cap'), "name" (job
name), "detail" (Job Details), and save it as a usable format.
1) Create a project called crawposition;
2) define the task specified crawling field;
3) construct the corresponding crawler request;
4) the specified file storage location;
5) crawling key data;
6) store data to HDFS file system.
Task three, data cleaning and Analysis (2255 points)
1) write data cleaning procedures, the edited program packaged into jar package and stored;
2) crawled data taken for cleaning, the data of each field after cleaning storing a usable format;
3) save the results to the database after washing Hbase;
4) selecting the appropriate field, the result is written to the new cleantable table, and the table data view;
5) query "data" skill requirements related positions, New table_bigdata write the query results
table;
6) create keycount table and the number of occurrences statistics were following a single core skills.
Note: the following core skills Keywords: c ++, Scala, FFlume, Flink, ETL, mathematics, data warehouse
database, Hbase, Hadoop, Python, Java , Kafka, Storm, Linux, Hbase, Spark.
Task four, data visualization (20 minutes)
to visualize the results of data analysis, the analysis of data visualization presentation.
1) shows the regional total recruitment, in descending arranged at the forward end;
2) shows differences in the average wage throughout, and displays at the front end;
3) shows the difference of the average wage throughout.
Task Five: Comprehensive analysis (15)
Please write the following combined data analysis results analysis:

1) Based on the results
listed in the largest number of city three recruitment.
2) According to the regional level
average wage analysis to identify cities with the highest average salary.
3) According to the regional level
average wages analysis, to find the average wage in Hangzhou ranked first of several.
4) Please describe briefly, "XHS
Group" to establish R & D center, you recommend the most suitable cities, and why.

 

achieve:

Link: https: //pan.baidu.com/s/1fWoUPRL9KeVsZVpA9ZgXcA 
extraction code: oolu 
copy the contents of this open Baidu network disk phone App, the operation more convenient oh

Guess you like

Origin blog.csdn.net/weixin_40903057/article/details/90599368