Big Data application period the total job

The requirements of the job from: https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/3339


 

Hadoop Comprehensive operational requirements:

1. Upload a csv file operations generated large reptiles to HDFS

Chosen here is the big reptile job - evaluation of the top 250 movies of watercress

Here is douban.csv selected file, a total of 32829 data.

 

 

First, create a / usr / local / bigdatacase in the local / dataset folder. Then copy douban250.csv files to this folder, and then

Delete the first line and displaying the first five records recorded as shown below:

 

CSV file for preprocessing text files generated Untitled

Pre_deal.sh edit files csv file data preprocessing, so that the content pre_deal.sh take effect. As shown below:

 

 See user_table.txt contents inside, as shown below:

 

The user_table.txt stored in / usr / local / folder bigdatacase authority given below:

Then, start hadoop, establish / bigdatacase / dataset folder on HDFS

And upload the user_table.txt step HDFS follows:

 See the HDFS User_table.txt the first 10 rows, as shown below:

Start the MySQL database, start Hadoop, Hive start, enter the command to create a database dblab Hive in line, as shown below:

 Create an external table, the data is loaded / bigdatacase under HDFS in / dataset directory to the Hive warehouse,

And displaying first ten bigdata_user data shown below:

 

  Query 10 before watercress user rating for the movie, as shown below:

Queries film score was 9 user evaluation of the film. As shown below:

See watercress movie film score is less than 8 minutes, as shown below:

View watercress movie character evaluation score of less than 8 minutes of the movie. As shown below:

 

 Summary: This semester I have a more in-depth understanding of the Hadoop file system mapreduce there hdfs, also hive of creating a database,

Structured Query function more in-depth understanding. More learning python. To understand the true purpose of this course, semester and learned a lot of new knowledge, but also

Review previous knowledge of computer so I have a more in-depth understanding!

Guess you like

Origin www.cnblogs.com/lb2016/p/11020622.html