2023 Vocational College Skills Competition Secondary Vocational Group - Big Data Application and Service Competition Task Book Test Questions


1. Competition content
This test paper includes three modules: database system operation and maintenance, data collection and processing, and big data application development. The test paper is full of 100 points.
2. Competition time
Competition time: 240 minutes in total.
3. Requirements for competition matters
1. Contestants are not allowed to bring communication equipment and other items into the venue. It is strictly prohibited to mark any information of the competition team anywhere in the program and running results. Violators will be treated as cheating.
2. Please check whether the listed hardware equipment, software list, and material list are complete according to the competition environment provided by the competition, and whether the computer equipment can be used normally.
3. Before the end of the competition, integrate the operation process and result data of each test question, and submit complete documents, codes, screenshots and other competition results to the designated directory.
4. At the end of the competition, the answering channel will be closed in the backstage, and all items used in the competition (including test papers and manuscript papers) are prohibited from being taken away from the competition venue.

Module 1: Database system operation and maintenance (25 points)

Task 1: Database system construction (10 points)

[Task Requirements]
This link requires the root user to complete relevant configurations to build, configure and use the database system MySQL;
[Task Requirements Background]
In an environment where the amount of data is increasing, the data of a single department usually cannot meet the needs of the entire enterprise. At this time, it is necessary to integrate through the database to summarize the data from various departments into one system to achieve data sharing and better realize information sharing and interoperability. Such as material management, software project management, personnel management, etc. These tasks require the establishment of corresponding databases for maintenance in order to better manage relevant data.
[Specific tasks]
1. Add users and groups of the MySQL database system, and paste the complete command screenshot into the corresponding answer question report.
2. Unzip the MySQL installation package to the /usr/local path, and paste the complete command screenshot into the corresponding answer question report;
3. Rename the decompressed Mysql package to mysql in the /usr/local directory, and paste the screenshot of the complete command into the answer report; 4. Modify the mysql folder in the /usr/
local/ directory to be owned by the mysql group Permissions, paste the complete command screenshot into the answer question report;
5. In the /usr/local/mysql directory (must be executed in the mysql directory, pay attention to the output text, which contains the commands to modify the root password and start mysql), Initialize the MySQL database system, and paste the complete command and screenshots of successful initialization into the answer report;
6. In the usr/local/mysql/ directory, execute the startup command for the service after initialization of the MySQL database, and paste the complete command and screenshots of successful initialization. Paste the screenshot into the corresponding answer question report;
7. In the usr/local/mysql/ directory, set the password of the login user root of the MySQL database, and paste the screenshot of the complete command and successful initialization into the corresponding answer question report;
8. In the usr/local/mysql/ directory, set the password of the MySQL database login user root, and paste the screenshot of the complete command into the answer report; 9. In the usr/local/mysql/ directory, copy /usr
/ local/mysql/support-files/my-medium.cn configuration file to the etc directory, increase or modify the maximum number of connections to the MySQL database, save the modified configuration file, and paste the complete command and modified configuration screenshot into the corresponding answer question In the report;
10. Log in to the MySQL database system through the root user, view all tables under the mysql library, and paste the complete command and the screenshot of the result after executing the command into the corresponding answer question report;

Task 2: Housing database system operation and maintenance (15 points)

[Task Requirements]
This link requires the use of the MySQL database system to complete operations such as database creation, table creation, and data addition, deletion, modification, and query of user rental information in various cities.
[Task requirement background]
In order to understand the overall situation of renting houses in various cities so as to better provide services to customers. By analyzing and visualizing the rental information data, we can obtain some important information, such as the area of ​​the house, the price of the house, the location of the house, etc. This information can help us better understand the overall situation of rental housing in the city. Therefore, we can establish a A housing information management system, managed and maintained through a MySQL database, is very necessary.
[Specific tasks]
1. In the Mysql library, create a database named tenantdb and view the database. Paste the complete command and result screenshots into the corresponding answer question report; 2.
In the Mysql library, select and use the tenantdb just created. Database, paste the complete command and result screenshots into the answer question report;
3. Create a data table named rental_info in the tenantdb database. The fields included are shown in the table below. Specify the user_id field as the primary key. This field is not empty and is incremented. The database engine is InnoDB, the default character set is utf8, and the field type should conform to the actual meaning. Paste the screenshot of the complete command and running results into the corresponding answer question report;

Table 1 rental_info table field description:

Field illustrate
user_id tenant id
user_id tenant id
user_name Tenant name
sex Tenant gender
age Tenant age
address Home address

4. In the Mysql library, check the rental_info table structure just created, and paste the complete command and result screenshot into the corresponding answer question report; 5.
In the Mysql library, modify the rental_info table structure, change the field sex to user_sex, and add the field rental_address. (rental address), the field type should conform to the actual meaning. Paste the screenshot of the complete command and running results into the answer report;
6. In the Mysql library, insert three pieces of tenant rental information into the rental_info table:

  • Xiao Zhang, male, 29, from Tianfu New District, Wuhou District, Chengdu City;
  • Xiao Li, male, 27, from Chenghua New District, Chengdu High-tech Zone;
  • Xiao Wang, male, 32, from Tianfu New District, Jinjiang District, Chengdu City.

Paste the screenshot of the complete command and the running results into the report corresponding to the answer question;
7. After inserting the data, query the table data, and paste the screenshot of the complete command and the running results into the report corresponding to the answer question;
8. Paste the record with user_id 1 in the rental_info table, The name was changed to Zhang San, and the age was changed to 35. Paste the screenshot of the complete command and running results into the corresponding answer question report;
9. After modifying the rental_info table data, query the table data, and paste the complete command and running result screenshot into the corresponding answer question report; 10. Delete the
name Xiao Li in the rental_info table
11. After deleting the rental_info table data, query the table data, and paste the screenshot of the complete command and running results into the corresponding answer question report .

Module 2: Data collection and processing (30 minutes)

Task 1: Second-hand housing data collection (10 points)

[Task Requirements]
This link requires the use of the library imported in the project file to complete operations such as collecting and saving housing information.
[Task Requirement Background]
There is a massive amount of data in the Internet. Data collection through manual operations is inefficient and cumbersome. How to obtain data sources efficiently has become a primary issue. This project uses web crawler technology to collect data information, grab housing data from the "Second-hand Housing Information Query Website", and store the data.
[Specific tasks]
1. Use Google Chrome in the virtual machine to access the "Second-hand housing information inquiry website". The website access address is [http://127.0.0.1:5000]. The rendering of the website homepage is as follows; 2. Click on the city
Insert image description here
label Jump to the corresponding page. Taking "Chengdu" as an example, the "Chengdu Second-hand House Information" page is shown as follows;
3. Use PyCharm to open the "House" project on the desktop, and code in the "crawl_house.py" file under the "spider" package. Used to capture second-hand housing data in seven cities: Beijing, Guangzhou, Tianjin, Shenzhen, Foshan, Nanning, and Taiyuan from the "Second-hand Housing Information Inquiry Website" and save them to xlsx files by city name. The xlsx file is stored in the [spider/house_data/] directory in the "House" project. If the directory does not exist, you need to create the directory yourself. Insert image description here
The captured second-hand house information data and file name requirements are as follows:

file name List
City name_house.xlsx (such as "Beijing_house.xlsx") Layout, area, orientation, number of floors, age, total price, square meter price
Task 2: Cleaning of housing information data (10 points)

[Task Requirements]
This link requires the use of Excel tools to process data files, including sorting, filtering, data annotation, etc.
[Task Requirement Background]
In the era of data assets, population brings data, data achieves social development, and data can be said to be everywhere. With the continuous development of science and technology, the amount of data generated in our lives is increasing. How to filter out the really needed data from the massive data is very critical. This task uses Excel tools to process the data and filter out the corresponding properties according to the requirements.
[Specific tasks]
1. Use Excel to open the housing information files of "Beijing" and "Shenzhen" saved in Task 1;
2. Filter out the 20 housing information files with an area of ​​90-100m2 and the lowest price. After the last column, a new column of "intention ranking" is added, in order of price from low to high. The first 10 rows are marked as "high-quality housing" and the last 10 rows are marked as "general housing". After completion, paste the corresponding screenshot on the answer report.

Task 3: New house data processing (10 points)

[Task Requirements]
This link requires the use of NumPy, Pandas and other data processing tools to complete the processing of new house information data, and save the processed data.
[Task Requirement Background]
Data processing can divide a large amount of potentially messy data into different categories and organizations, provide people with useful, meaningful, and easy-to-understand information, and help people manage and use data more efficiently. In modern society, data processing runs through various fields and has become a key link in people's activities such as data classification, organization, coding, storage, query and maintenance. There is now a batch of new house information data. Each new house information includes fields such as layout, area, orientation, number of floors, age, and price. Abnormal data in the original data needs to be processed for subsequent use.
[Specific tasks]
1. Use PyCharm to open the "House" project on the virtual machine desktop, and encode in the "clean_house.py" file under the "clean_data" package. This file is used to clean and save the new house data.
2. The data to be cleaned is saved under "data" in the "clean_data" package of the "House" project. The cleaning requirements are as follows:
(1) Segment the data for the "Price" column, and set the column name to "Total Price" after segmentation. "average price".
(2) For the missing data in the "Price" column, if there is another piece of data with the same "Area" column, fill it with the "Price" in the data; if it does not exist, delete the missing data. data.
(3) Delete all data with missing values ​​in other columns.
3. Save the processed data as xlsx files by city name (such as "Beijing_new_house.xlsx"), and store them in the [clean_data/house_data/] directory in the "House" project. If the directory does not If it exists, you need to create the directory yourself.
4. Use Excel to open the files saved in the [clean_data/house_data/] directory, sort by the "Area" column, and paste the corresponding screenshots on the answer report after completion.

Module 3: Big data application development (45 points)

Task 1: Data analysis and visualization based on Tableau (10 points)

[Task Requirements]
This link requires the use of the data visualization tool Tableau to perform visual display based on housing information data;
[Task Requirements Background]
In order to understand the overall situation of rental housing in various cities, in order to better provide services to customers. By analyzing and visualizing rental information data, we can obtain some important information, such as housing area, housing price, housing location, etc. This information can help us better understand the overall rental situation in the city.
[Specific tasks]
1. Data related to rental information is stored in "rental information.csv in each city" under "draw_price" on the Windows desktop. Use the data visualization tool Tableua to connect to the csv file data source in the Windows desktop directory to draw the rental price with area. Change line chart. The X-axis label is displayed as area, the Y-axis label is set to the highest rental price, and the title is set to "Area-House Price Trend Chart"; 2.
According to the "Rental Information in Each City.csv" data table, use Tableua to connect to the csv file data source. "Housing Location" counts the data of houses for sale in each area and draws a statistical histogram of the data of houses for sale. The X-axis scale label displays the location name, the Y-axis label displays the number of properties for sale, and the title is set to "Comparison chart of properties for sale in each location."
3. The font size of the X-axis scale label is a custom size, and the font size of the Y-axis scale label is a custom size;
4. Take a screenshot of the completed chart and paste it into the corresponding position on the answer report.

Task 2: Data analysis and visualization based on Excel (10 points)

[Task Requirements]
This link requires the use of Excel development tools to conduct data analysis and processing of rental price data tables in each city, and perform visual display; [Task Requirements
Background]
In order to understand the overall situation of rental housing in each city, in order to better serve customers Provide services. By analyzing and visualizing rental information data, we can obtain some important information, such as housing area, housing price, housing location, etc. This information can help us better understand the overall rental situation in the city.
[Specific tasks]
1. Use the Excel tool to open the "rental information.csv of each city" under the "draw_price" file on the Windows desktop, use this data source to draw a histogram, set the X-axis label to the city name, and set the Y-axis label to the average rent Price, the title is set to "Comparison Chart of Average Rental Prices in Various Cities", the font size of the X-axis scale label is 8, and the font size of the Y-axis scale label is 8; (1) Use a pivot table to insert in a new worksheet and select
all The city name is used as column A, and the rental price is averaged and used as column B;
(2) Use the city name as the x-axis data and the average price as the Y-axis data to draw a histogram;
(3) Set the column color to blue Color, the chart is filled with orange and has a black border;
2. Use the Excel tool to open the "rental information.csv" under the "draw_price" file on the Windows desktop, filter out the city data of "Beijing", and classify it according to the "house type" Make statistics to calculate the number of houses corresponding to each type of house type, and draw a donut chart of house type distribution based on the house type statistical data.
3. Take a screenshot of the completed chart and paste it into the corresponding location on the answer report.

Task 3: Draw a line chart of rental data based on Python (10 points)

[Task Requirements]
This link requires the use of Pycharm development tools, Numpy, Pandas, Matplotlib, Seaborn and other libraries to draw a line chart of rental prices changing with area based on housing information data; [Task Requirements Background] In order to understand the overall situation of rental housing in
each
city , in order to better serve customers. By analyzing and visualizing rental information data, we can obtain some important information, such as housing area, housing price, housing location, etc. This information can help us better understand the overall rental situation in the city.
[Specific tasks]
1. The rental information is stored in the "rental information.csv of each city" under "draw_price" in the "House" project of the virtual machine desktop, and the line chart is drawn in draw_img1.py in the same level directory; 2
. Use the Pandas library to read the CSV file, filter out the area as the X-axis data, use the corresponding price as the Y-axis data, and use the city name as the basis for division. Use the Matplotlib library to draw multiple line charts to show the comparison of housing price trends for each area in each city. Figure;
3. The title is set to: Comparison of rental prices in various cities;
4. The X-axis label displays the house area, and the Y-axis label displays the rental price;
5. The font size of the X-axis scale label is 10, and the font size of the Y-axis scale label is 10;
6. Save the drawn picture to the "Img" path of the "House" project and name it "line.png". If the directory does not exist, you need to create it yourself.
7. Take a screenshot of the completed drawing and paste it into the corresponding location on the answer report.

Task 4: Draw a scatter plot of rental data based on Python (10 points)

[Task Requirements]
This link requires the use of Pycharm development tools, Numpy, Pandas, Matplotlib, Seaborn and other libraries to draw a scatter plot of the highest rental prices in each city based on housing information data; [Task Requirements Background] In order to understand the overall rental situation in
each
city situation in order to better serve customers. By analyzing and visualizing rental information data, we can obtain some important information, such as housing area, housing price, housing location, etc. This information can help us better understand the overall rental situation in the city.
[Specific tasks]
1. The rental information is stored in the "rental information.csv of each city" under "draw_price" in the "House" project of the virtual machine desktop, and the scatter plot drawing is completed in draw_img2.py in the same level directory;
2 , use the Pandas library to read the CSV file, and use the Matplotlib library to draw a scatter plot of housing prices in each city;
(1) Use the city name as the basis for division,
(2) Divide the "housing area" into 7 categories as the X-axis data, respectively " "Below 50㎡", "50㎡ 80㎡", "80㎡ 100㎡", "100㎡ 120㎡", "120㎡ 150㎡", "150㎡~200㎡", "above 200㎡", in each category The average house price by area is used as Y-axis data to draw a scatter chart.
(3) The title is set to: Distribution chart of average rental prices in different areas of each city;
(4) The X-axis label displays the house area, and the Y-axis label displays the average rental price;
(5) The font size of the X-axis scale label is 8, Y The font size of the axis scale label is 8;
(6) Save the drawn scatter plot to the "Img" path of the "House" project and name it "scatter.png". If the directory does not exist, you need to create it yourself.
3. Take a screenshot of the completed drawing and paste it into the corresponding location on the answer report.

Task 5: Data analysis report (5 points)

[Task Requirements]
This link outputs a data analysis report based on the data analysis results;
[Task Requirements Background]
By analyzing and visualizing the rental information data, we can better understand the overall situation of urban rentals and analyze the distribution of house types and housing prices in different regions. trends, number of housing listings, rental agency information, etc., and make appropriate decision-making suggestions and plans based on the analysis and visualization results.
[Specific tasks]
1. Based on the information of "rental information in each city.csv" and the visual results of the previous four tasks, open the "data analysis report.docx" file under the "House" project on the virtual machine desktop, improve the data analysis report, and provide The "Aijiake" agency puts forward new suggestions for suitable housing listings.
2. Take a screenshot of the filled-in content and paste it into the corresponding location of the answer report.

Guess you like

Origin blog.csdn.net/Aluxian_/article/details/133355588