Non-relational database training-big data platform and application

After graduation, there are no relevant screenshots of the original code, and the experimental documents can be obtained directly at the end of the article.

Big data platform and application

1 summary

Through the analysis and modeling of practical problems, data model selection and other links, this course improves students' ability to use non-relational models to solve practical problems. In order to realize the goal of the course, this course uses a practical case, comprehensive, and slightly scaled non-relational database system analysis, design, implementation, debugging, testing and demonstration. The course assessment adopts the combination of process assessment and result assessment, which not only examines the ability of students to understand and solve problems in the design, but also assesses the practicability and rationality of the final design results. In short, the teaching of this course fully implements the concept and requirements of cultivating students' ability to solve complex engineering problems in the links of design solutions, design and development systems, etc., so as to achieve the goal of this course.
insert image description here

2 Topic design (describe the topic design ideas in detail):

1. Data set download, data preprocessing (a removable file will be obtained)
2. Import local data set into Hive for data analysis
3. Hive\Mysql\Hbase data mutual guidance
4. Use Python for data visualization analysis

Upload the local dataset to the data warehouse Hive
insert image description here

Hive data analysis

insert image description here

Hive\Mysql\Hbase data mutual guidance
insert image description here

The data set used in this report is user.zip, which contains a large-scale data set raw_user.csv (contains 20 million records) and a small data set small_user.csv (contains only 300,000 records). The small data set small_usercsv is a small part of data extracted from the large-scale data set raw_user.csv. The reason why a small number of records are extracted to form a small data set is that when running through the entire experimental process for the first time, various errors and problems will be encountered. First, test with a small data set, which can save a lot of programs operation hours. After the first complete experimental process runs smoothly, the final test can be carried out with a large-scale data set.
The training report is an important part of the learning of the big data technology system. It can form a global understanding of the comprehensive application methods of the big data technology, so that the learned technologies can be effectively integrated, and the practical application problems can be solved through the combination of various technologies. It covers the installation and use methods of Linux, MySQL, Hadoop, HBase, Hive, Sqoop, Python, Eclipse and other systems and software. The installation and use methods of these software are effectively integrated into each process of the experiment, which can effectively deepen the understanding of various technical understanding.
6. Project detailed implementation process or program source code list:
1. Download and save the experimental data set
1. Download the data set, download a small data set small_user.csv (including 300,000 records) from the official website of the reference book
2. First Create a directory bigdatacase under /usr/local to run this case
3. Create a dataset under /usr/local/bigdatacase/ to save the dataset
4. Move the small_user.csv under the dataset /home/hadoop/download/ Go to dataset
5, view the first five records of the previous small_user.csv dataset

insert image description here

3 Dataset Preprocessing

1. Delete the field name in the first line of the file
2. Preprocess the field, create a script file pre_deal.sh and insert the content

insert image description here
insert image description here

4 import database

1. Start HDFS
2. Upload user_table.txt to HDFS

insert image description here

3. Create a database on Hive and start Hive
insert image description here

4. Create an external table
insert image description here
insert image description here

5. Query data
insert image description here
insert image description here

5Hive data analysis

1. Simple query analysis
A queries the behavior of the top 10 users on commodities
insert image description here

B Query the time and type of goods purchased by the top 20 users
insert image description here

C nested statement
insert image description here

2. Statistical analysis of the number of queries
A uses the aggregation function count() to calculate how many rows of data there are in the table
insert image description here

B Add distinct inside the function to find out how many pieces of data with non-duplicate uids

insert image description here

3. Keyword condition query analysis
A query based on the existence interval of the keyword
insert image description here

insert image description here

The B keyword assigns a fixed value as a condition to analyze other data
insert image description here

4. According to user behavior analysis
A, query the purchase ratio or browsing ratio of a product on a certain day
insert image description here

B Query the proportion of a user's click on the website on a certain day to all clicks on that day
insert image description here

C Given the quantity range of purchased goods, query the user id who purchased the quantity of goods on the website on a certain day
insert image description here

5. User real-time query analysis
Query the number of times users in a certain place browse the website that day
insert image description here
insert image description here

6 hive、mysql、hbase互导

1. Create a temporary table user_action
insert image description here

2. Insert the data in the bigdata_user table into user_action (execution time: about 10 seconds)
insert image description here
insert image description here

3. Log in to MySQL (import from Hive to MyAQL)
4. Create a database
insert image description here

5. Create a table
Next, create a new table user_action in the MySQL database dblab, and set its encoding to utf-8:
insert image description here

6. Import data
insert image description here

7. View the user_action table data in MySQL
insert image description here

8. Start Hbase (import from MySQL to Hbase)
insert image description here

  1. Create table user_action
    insert image description here

10. Import data
insert image description here

  1. View user_action table data in HBase
    insert image description here

12. Data preparation (use HBase Java API to import data from local to HBase)
insert image description here

  1. Write a data importer
    insert image description here

  2. Export as a jar package
    insert image description here

  3. Empty user_action table
    insert image description here

  4. Run the hadoop jar command to run the program
    insert image description here

  5. View user_action table data in HBase
    insert image description here

7Python for data visualization analysis

  1. Analyzing consumer behavior towards products

insert image description here

insert image description here
insert image description here
insert image description here

  1. Analyze the top 10 selling items and their sales
    insert image description here
    insert image description here

8 References:

Big data basic programming, experiments and case tutorials
Lin Ziyu blog
NOSQL database principles
Big data technology principles and applications

9 Realization environment (system environment and development software used):

Linux:Ubuntu(VMware Workstation Pro)

9. Summary and experience (problems or personal experience encountered when completing the project):
the document contains

Pay attention to the official account: Timewood
Reply: Non-relational data training
can get relevant codes, data, and documents.
For more university course experiment training, you can follow the official account and reply to related keywords
. If you are not good at learning, please give me advice if you make mistakes.

Guess you like

Origin blog.csdn.net/qq_43374681/article/details/118365320
Recommended