Cloud Computing and Big Data Chapter 7 Big Data Overview Practice Questions and Answers

Chapter 7 Big Data Overview Exercises

7.1 Multiple choice questions

1. Which of the following statements is wrong (B).

A. Data refers to the symbols that record objective events and can be identified    

B. Information is the form and carrier of data

C. Data can only be called information in the process of transmission

D. The timeliness of information is of great significance to the use and transmission of information

2. From the perspective of data representation, the main typical characteristics of big data are (A).

              ①Massive ②Various ③Quick ④Value

A. ①②③④     B. ②③④       C. ①③④        D. ①②④

3. The following (B) is not a major component of the big data life cycle.

A. Data Acquisition B. Data Compression C. Data Processing D. Results Visualization

4、At present, big data platforms mainly include big data acquisition platform, big data batch processing platform, stream data processing platform, memory computing platform and deep learning platform, etc.; the following (C)  belongs to stream data processing platform.

A. Hadoop                    B. Pytorch             C. Storm             D. TensorFlow

5. Nutch is a highly scalable search engine written in (D) language.

A. Pytorch             B. C                  C. BASIC          D. Java

7.2 Fill in the Blank

1. Data (Visualization) refers to the method of displaying data and analysis results in an intuitive and easy-to-understand form in a graphical way.

2. (Deep learning) By establishing a multi-level deep neural network for analysis and learning, combining low-level features to form more abstract high-level representation attribute categories or features, in order to discover the distributed feature representation of data.

3. In the Storm platform, (Topology/Topology) is a directed acyclic graph composed of a series of Spouts and Bolts related to each other through data flow.

4. TensorFlow is composed of two parts (tensor/Tensor) and (data flow/Flow).

5. Spark job execution generally adopts (master-slave) architecture.

7.3 Short answer questions

1. Please briefly describe the relationship between Nutch and Hadoop.

answer:

Nutch provides effective support for parallel data collection based on multiple physical hosts under the Hadoop distributed platform. Under the Hadoop distributed platform, Nutch uses the Hadoop distributed file system to collect data related to a topic in the page through Hadoop's MapReduce computing model, which can collect a large amount of data in a short time. The relationship between Nutch and Hadoop is shown in the figure below.

2. Analyzing relevant data can help companies reduce costs, improve efficiency, develop new products, and make more informed business decisions. What goals can enterprises generally achieve through big data analysis?

answer:

(1) Analyze the root causes of failures, problems and defects in a timely manner, thereby reducing costs.

(2) Plan real-time traffic routes for thousands of express vehicles to avoid congestion.

(3) Analyze inventory, price and clear inventory with the goal of maximizing profit.

(4) According to the customer's purchasing habits, push the preferential information that the customer may be interested in.

(5) Quickly identify golden customers from a large number of customers.

(6) Avoid fraudulent behavior through traffic analysis and data mining.

3. In order to ensure the correctness of content, what specifications does Wikipedia formulate in terms of technology and operating rules?

answer:

(1) Version control. Keep the updated version of each entry, even if the participant deletes the entire entry, the administrator can easily restore the entry from the record.

(2) Entries are locked. Locking technology is used to lock the content of some main entries, so that other people can no longer edit these entries.

(3) Update remarks. When updating an entry, you can make a note in the description column, so that the administrator knows the operation details of the entry update.

(4) IP ban. In order to prevent malicious users from damaging the system and content, Wikipedia uses the method of identifying and disabling IPs to prevent subsequent damaging behaviors of malicious users.

(5) Sandbox testing. All Wikipedia entries have a sandbox test page, so that first-time participants can go to the sandbox page to familiarize themselves with system functions without damage, even if they make mistakes in operation.

7.4 Answer questions

1. The manufacturing industry needs to use data analysis technology, tools or platforms to intelligently discover new patterns and knowledge from a large amount of complex raw production data as a decision-making basis for improving the production process. What levels does the manufacturing-oriented data processing platform architecture include?

answer:

  1. Physical resource layer. The physical resource layer mainly includes the underlying physical devices, which can effectively support data storage and expansion.
  2. Logical resource layer. The logical resource layer includes storage resources and computing resources. Storage resources are based on physical devices, including traditional databases, local file systems, and distributed file systems. Computing resources are logical computing units. The computing power of the data processing platform depends on the number of computing units. By expanding and configuring the number of computing units, it can effectively support the data mining tasks of the upper layer.
  3. Data analysis task management. This layer is the core of the data processing platform, which can effectively connect the analysis function and the background cluster. A reasonable data analysis platform design needs to have task management capabilities, mainly including easy algorithm expansion, support for configuration of task flow and inter-task dependencies, task scheduling, configuration of computing resources and storage resources. The data analysis platform effectively supports data analysis task management through the data analysis framework.
  4. Data analysis layer. The data analysis layer provides the user execution interface for specific analysis tasks. The data analysis tasks mainly include data cube, comparative analysis, time dimension analysis, data operation, result display and analysis report.

Guess you like

Origin blog.csdn.net/m0_63394128/article/details/126567994