HCIA-Big Data V3.0 Huawei Certified Big Data Engineer Online Course Chapter Test Questions Summary

1.Big data development trends and Kunpeng big data

1. (Single choice) Which of the following is not an emerging technology in the era of big data:

A.HBase

B.Hadoop

C.MySQL

D.Spark

Correct answer: C

2. (Single choice) The signs of the third wave of informatization are:

A. Popularization of cloud computing, big data, and Internet of Things technologies

B. Popularization of personal computers

C. Popularization of the Internet

D. Popularization of virtual reality technology

Correct answer: A

3. (Multiple choice) The 4V characteristics of big data include:

A.Large amount of data

B. Various data types

C. Fast processing speed

D. Low value density

Correct answer: ABCD

4. (Multiple choice) Which of the following is the correct understanding of the components of Hadoop:

A.Pig: A scripting language for processing large-scale data

B.Kafka: Distributed publish-subscribe messaging system

C.Oozie: Workflow and collaboration service engine

D.Tez: Computing framework supporting DAG jobs

Correct answer: ABCD

5. (Judgment) “Big” is the key to big data, and big data must contain useful value!

Correct answer: False

2.HDFSDistributed File System and ZooKeeper

1. (Single choice) The HDFS namespace does not contain:

A.Block

B. Bytes

C.Documents

D.Directory

Correct answer: B

2. (Single choice) The advantages of using multi-copy redundant storage do not include:

A. Easy to check for data errors

B. Ensure data reliability

C. Save storage space

D. Speed ​​up data transmission

Correct answer: C

3. (Multiple choice) The limitations of HDFS only setting up a single name node include:

A. Namespace restrictions

B. Cluster availability

C. Performance bottleneck

D.Isolation problem

Correct answer: ABCD

4. (Multiple choice) The Zookeeper cluster mainly has the following roles:

A.Leader

B.Follower

C.Observer

D.Master

Correct answer: ABC

5. (Judgment) Zookeeper’s child node Znode will inherit the ACL of the parent node.

Correct answer: False

3.HiveDistributed Data Warehouse

1. (Single choice) Which of the following explanations about Hive’s basic operating commands is incorrect:

A. create database userdb;//Create database userdb

B. create table if not exists usr(id bigint,name string,age int); //If the usr table does not exist, create the table usr with three attributes id, name, age

C. load data local inpath '/usr/local/data' overwrite into table usr;//Load the data in the data file under the directory 'usr/local/data' in an appended manner Enter usr table

D. insert overwrite table student select * from user where age>10;//Insert data with age greater than 10 from the usr table into table usr1 and overwrite the original data in the student table

Correct answer: C

2. (Multiple choice) Which of the following statements is correct:

A.Hive and HDFS, HBase, Spark, Flink and other tools can be deployed uniformly on a Hadoop platform

B.Hive itself does not store and process data. It relies on HDFS to store data and MapReduce to process data.

C.HiveQL syntax is very similar to traditional SQL syntax

D. The data warehouse Hive can store data without the help of HDFS.

Correct answer: ABC

3. (Multiple choice) The following basic data types of Hive are:

A.TINYINT

B.BINARY

C.FLOAT

D.STRING

Correct answer: ABCD

4. (Judgment) Hive was created to reduce the difficulty for programmers to use MapReduce.

Correct answer: Correct

4.HBaseTechnical principle

1. (Single choice) HBase is a () database.

A. Row database

B. Relational database

C. Column database

D.Document database

Correct answer: C

2. (Single choice) The sequence of the three-layer structure of HBase is:

A.Zookeeper file, -ROOT-table, .MEATA.table

B.Zookeeper file, .MEATA.table, -ROOT-table

C..MEATA.table, Zookeeper file, -ROOT-table

D.-ROOT-table, Zookeeper file, .MEATA. table

Correct answer: A

3. (Single choice) The client locates the Region through () level addressing.

A.Three

B. Four

C.two

D.one

Correct answer: A

4. (Multiple choice) What are the differences between HBase and traditional relational databases:

A.Data maintenance

B. Storage mode

C.Data model

D.Data index

Correct answer: ABCD

5. (Multiple choice) What are the ways to access rows in the HBase table?

A. Access via a single link

B. Access through a healthy interval

C. Full table scan

D. Through the value range of a certain column

Correct answer: ABC

5.MapReduce and Yarn technology principles

1. (Single choice) When using the MapReduce program WordCount to perform word frequency statistics, for the text line "hello hadoop hello world", the intermediate result directly output after being processed by the Map function of the WordCount program should be in the following form:

A.<"hello",1>、<"hello",1>、<"hadoop",1>和<"world",1>

B.<"hello",1,1>、<"hadoop",1>和<"world",1>

C.<"hello",2>、<"hadoop",1>和<"world",1>

D.<"hello",<1,1>>、<"hadoop",1>和<"world",1>

Correct answer: B

2. (Single choice) For the text line "hello hadoop hello world", the result after processing by WordCount's Reduce function is:

A.<"hello",1><"hello",1><"hadoop",1><"world",1>

B.<"hello",1,1><"hadoop",1><"world",1>

C.<"hello",<1,1>><"hadoop",1><"world",1>

D.<"hello",2><"hadoop",1><"world",1>

Correct answer: D

3. (Multiple choice) Which of the following parts does the MapReduce V1 architecture mainly consist of:

A.JobTracker

B.Client

C.Task

D.TaskTracker

Correct answer: ABCD

4. (Judgment) One of the concepts of MapReduce design is "computation moves closer to data" rather than "data moves closer to calculation", because moving data requires a lot of network transmission overhead.

Correct answer:Correct

5. (Judgment) Two key-value pairs <"a",1> and <"a",1>, if they are merged (merge) , you will get <"a",2>, if you combine it, you will get <"a",<1,1> ;>.

Correct answer: False

6.SparkMemory-based distributed computing

1. (Single choice) Which of the following languages ​​does Spark SQL currently not support:

A.Matlab

B.Java

C. Python

D.Scala

Correct answer: A

2. (Single choice) RDD operations are divided into two types: Transformation and Action. The following operations belong to the Action type:

A.filter

B.count

C.groupBy

D.map

Correct answer: B

3. (Single choice) The following big data types are not suitable for their corresponding software frameworks:

A. Complex batch data processing: MapReduce

B. Data processing based on real-time data flow: Flink

C. Interactive query based on historical data: Impala

D. Calculation of graph structured data: Hive

Correct answer: D

4. (Multiple choice) The main features of Spark include:

A. Good versatility

B. Various operating modes

C. Fast running speed

D.Easy to use

Correct answer: ABCD

5. (Multiple choice) Spark’s operating architecture includes:

A. Worker Node that runs job tasks

B.Cluster Resource ManagerCluster Manager

C. Task control node Driver of each application

D. Executor, the execution process responsible for specific tasks on each worker node

Correct answer: ABCD

7.FlinkStream-batch integrated distributed real-time processing engine

1. (Single choice) Which of the following does not belong to the type of event time:

A.event time

B.create time

C.ingestion time

D.processing time

Correct answer: B

2. (Single choice) The characteristics of the session window are:

A. Time alignment, fixed window length, no overlap

B. Time alignment, fixed window length, overlap

C. Time alignment, window length is not fixed, and there is overlap

D. No time alignment

Correct answer: D

3. (Multiple choice) Flink provides built-in state management, which can store these states inside Flink without storing it in an external system. The benefits of doing this are:

A. Increased memory consumption

B. Reduces the calculation engine’s dependence on external systems

C. Greatly improves performance

D. Make deployment, operation and maintenance easier

Correct answer: BCD

4. (Judgment) Flink uses flow computing to simulate batch processing.

Correct answer: Correct

5. (Judgment) Watermark is a concept based on event time and is used to characterize the integrity of the data flow.

Correct answer: Correct

8.FlumeMassive log aggregation

1. (Single choice) Which statement about source is incorrect:

A. The driven source is an external source that actively sends data to Flueme and drives Flueme to receive the data.

B. Polling source is Flume's initiative to obtain data periodically.

C. source can not be associated with any channel.

D.source is responsible for receiving incidents or generating events through a special mechanism.

Correct answer: C

2. (Single choice) Which statement about Flume’s log processing process is incorrect:

A.source accepts the amount of data.

B.Channel processing data volume

C. Amount of data written by sink

D.Manager graphically displays monitoring indicators.

Correct answer: B

3. (Multiple choice) Which of the following is the correct understanding of Flume:

A. Provides the ability to collect log information from a fixed directory to the destination.

B. Provide the ability to collect log information to the destination in real time.

C. Supports cascading and the ability to merge data.

D. Support the ability to collect data according to user customization.

Correct answer: ABCD

4. (Judgment) Flume multi-agent architecture is mainly used to collect logs from nodes outside the MRS cluster, and finally aggregate them into the cluster through multiple Flume nodes.

Correct answer: Correct

5. (Judgment) The role of Sink Runner in Flume architecture is mainly to drive Sink Processor through it. Sink Processor

Drive the sink to get data from the channel.

Correct answer: Correct

9.LoaderNumber installation

1. (Single choice) The module in Loader used to manage the active and standby status of the Loader Server process is:

A.Transform Engine

B.Metadata Repository

C.HA Manager

D.Job Manager

Correct answer: C

2. (Single choice) Loader provides a wealth of job conversion rules. If you need to replace null values ​​with specified values, which operator can be used:

A. Long integer time conversion

B. Null value conversion

C. Random value conversion

D. Cut the string

Correct answer: B

3. (Judgment) Compared with open source Sqoop, Loader has enhanced graphics, high performance, high reliability, and security.

Correct answer: Correct

4. (Judgment) job is used to describe the process of extracting, transforming and loading data from the data source to the destination.

Correct answer: Correct

10.KafkaDistributed message subscription system

1. (Single choice) Regarding the characteristics of Kafka, which of the following statements is wrong:

A.Kafka supports message partitioning and distributed consumption, while ensuring the sequential transmission of messages within each partition.

B.Kafka supports both offline data processing and real-time data processing

C.Kafka has the advantages of high throughput, message persistence, high reliability, and high scalability.

D.Kafka uses hard disk to persist messages, so its performance is slightly lower than other message queues.

Correct answer: D

2. (Single choice) Which of the following is not a role in Kafka:

A.leader

B.ResourceManager

C.follower

D.controller

Correct answer: B

3. (Multiple choice) Which of the following descriptions of concepts in Kafka is correct:

A.A Kafka cluster contains one or more service instances, which are called Brokers.

B.Consumer: Message consumer, the client that reads messages from Kafka Broker

C.Producer: Responsible for publishing messages to Kafka Broker

D.Kafka divides the Topic into one or more Partitions. Each Partition physically corresponds to a folder, and all the messages of this Partition are stored in this folder.

Correct answer: ABCD

4. (Judgment) Data between consumer groups are shared, and data within consumer groups compete.

Correct answer: Correct

5. (Judgment) Kafka’s log cleaning methods include: delete and compact.

Correct answer: Correct

11.LDAP Kerberos

1. (Single choice) Which of the following single sign-on technologies is used in Huawei’s big data platform:

A. Cookie technology

B.Broker-based technology

C.Gateway-based technology

D.Token-based technology

Correct answer: B

2. (Single choice) The command used by the Ldap client to query user information in Ldap is:

A.klist

B.ldapdelete

C.ldapadd

D.Idapsearch

Correct answer: D

3. (Multiple choices) The unified authentication management systems of most manufacturers are composed of the following parts:

A. Unified identity authentication management module

B. Unified identity authentication server

C. Identity information storage server

D. Unified identity authorization module

Correct answer: ABC

4. (Multiple choice) What are the core elements of the KrbServer authentication mechanism:

A.Kerberos Client

B.Kerberos KDC Server

C.AES(Advanced Encryption Standard)

D.KDC(Key Distribution Center)

Correct answer: ABD

5. (Judgment) The directory service stores and traverses data in a tree structure, just like traditional relational databases.

Correct answer: False

12.Distributed full-text search service ElasticSearch

1. (Single choice) Which of the following statements is incorrect:

A.ElasticSearch can be used as a NoSQL database

B.ElasticSearch does not support unstructured data

C.ElasticSearch is a full-text search service based on Lucene

D.ElasticSearch can be used in log search and analysis, spatiotemporal retrieval, time series retrieval and other scenarios

Correct answer: B

2. (Single choice) ElasticSearch cache mainly includes:

A.Query Cache

B.Fielddata Cache

C.Request Cache

D.All of the above

Correct answer: D

3. (Single choice) ElasticSearch capacity reduction scenarios do not include:

A. The node needs to reinstall the operating system

B. The index data of a single instance is too large

C. The amount of cluster data is reduced

D. Retirement scenario

Correct answer: B

4. (Judgment) ElasticSearch will adopt an automatic balancing strategy after expansion.

Correct answer: Correct

5. (Judgment) For secure clusters, encryption authentication is supported for ElasticSearch access.

Correct answer: Correct

13.RedisMemory database

1. (Single choice) The correct description of the multi-database feature of Redis is:

A. The database name can be customized.

B. Database No. 0 is selected by default.

C.select is used to select fields.

D. The default is 16 data, and the default value cannot be modified.

Correct answer: B

2. (Multiple choice) What are the characteristics of Redis usage scenarios:

A.High performance

B. Low latency

C. Rich data structure access

D.Support persistence

Correct answer: ABCD

3. (Multiple choice) The correct description of the string type of Redis is:

A. Support set to set key-value.

B.strlen can return the string length.

C.incr decrements the value stored in key by one.

D.append's key does not exist and is the same as the set function.

Correct answer: ABD

4. (Judgment) Redis is a network-based, high-performance key-value embedded database.

Correct answer: False

5. (Judgment) Redis cannot be used as a cache for relational CNC to improve access speed.

Correct answer: False

14.Huawei Big Data Solution

1. (Single choice) The engine services used for graph analysis and query in Huawei’s big data solution are:

A.GES

B.DWS

C.CSS

D.DLI

Correct answer: A

2. (Single choice) The functional modules that provide batch data migration services between homogeneous/heterogeneous data sources under the DAYU platform architecture and help customers realize the free flow of data inside and outside the lake and between lakes are:

A.Data development

B.Data integration

C.Data governance

D.Asset management

Correct answer: B

3. (Multiple choice) The obstacles faced by enterprises in digital transformation include:

A. Chimney application

B. Data silos

C. Resources are dispersed

D. Data failure

E. Data openness and privacy

F. Low data availability and poor quality

Correct answer:ABCDEF

4. (Multiple choice) What functions does Huawei Cloud Intelligent Data Lake DAYU platform provide:

A.Data integration

B. Standard design

C.Data development

D.Data quality monitoring

E.Data asset management

F.Data visualization

Correct answer:ABCDEF

5. (Judgment) The advantages of Huawei Cloud MRS are concentrated in high performance, high reliability, command line client, and elastic scaling.

Correct answer: False

Guess you like

Origin blog.csdn.net/gaogao0305/article/details/134584767