Hadoop short answer questions related to collection

The first question: Description of common Linux directory and its role in
the answer:
1, / etc: configuration files and subdirectories all system management needs.
2, / home: storage of the average user's home directory, in Linux, each user has its own directory, the directory name is based on the general user account named.
3, / mnt: The system provides this directory is to enable users to temporarily mount other file systems, we can drive mount / mnt /, and then enter the directory you can view the contents of the CD-ROM drive.
? / net store some files and network-related.
4, / opt: This is for the host to install additional software placed in the directory. For example, you install a ORACLE database you can put this directory. The default is empty.
5, / root: the directory for the system administrator, also referred to as user's home directory root authority.
6, / tmp: This directory is used to store temporary files.
7, / usr: This is a very important directory, many applications and user files are placed in this directory, similar to the program files directory under the windows.
8, / var: This directory contains a constantly expanding with something we were used to modify those directories are often placed in this directory. Including various log files.

The second question: Description of method used vi editor
answer:
1, vi by a text, enter the general mode, the operation can be carried out as follows
1) YY (Function Description: Copies the current cursor line)
Y digital y (Function Description: Copies period (from a few lines to the second lines, the first line that the blinking cursor, y2y, i.e. copy cursor line and a cursor next line))
2) P (function description: arrow to the object stuck in the line where the cursor pasted the next line)
. 3) U (function description: undo the last step) Use the undo
. 4) dd (function description: delete the current cursor row)
D digital D (function description: how many lines delete the cursor (inclusive))
. 5) Shift + ^ (function description: to move to the first line)
. 6) Shift + $ (function description: to move to the end of the line)
. 7) 1 + Shift + G (functional description: to move to the page header, number, press 1, and then press shift + g, attention is not press. 1 + Shift + G)
. 8) Shift + G (functional description: moves to footer)
9) number N + shift + g (function description: to move to the target row)
2 by i or o enter the edit mode
in the general mode can be deleted, copied Paste, etc. action, but you can not edit the file contents! To enter edit mode will wait until you press any letter "i, I, o, O, a, A, r, R " and so on. Exit edit mode, that is, into the general mode
press the "Esc" key
3 by:? / Enter command mode
: wq forced to save and exit!
:! Q does not save the file, force quit vi
Press the esc key to return to normal mode, then press Shift + z + z fast save (only able to save a non-read-only file for read-only files, or need wq! To save).

Third question: Linux common file directory enumeration class command
answer:
1, pwd show the absolute path to the current working directory
2, ls lists the contents of a directory
3, mkdir create a new directory
4, rmdir to delete an empty directory
5, Create an empty file touch
6, cd directory switch
7, cp copy a file or directory
8, rm remove files or directories
9, mv move or rename files and directories
10, cat view the contents of the file
11, more view the contents of the file
12, tail See document content

Fourth Question: Brief Linux, date and time of the class command and its uses
answer:
date -s set the time
date -d display non-current time
date displays the current time
cal View Calendar

The fifth question: Set hadoop ordinary user with root privileges
answer:
Modify / etc / sudoers file, locate the following line, add the following line to the root,
as follows: ## the Allow root to the any RUN Commands Anywhere
root ALL = (ALL) ALL
hadoop ALL = (ALL) ALL

Question 6: List common compression solution compression Linux commands and their usage
answer:
1, gzip / gunzip compression and decompression
2, zip / unzip compression and decompression
3, tar -zxvf unzip files compressed tar -zcvf

Q7: Composition and functions outlined hadoop
answer:
1) Hadoop HDFS: (hadoop the distribute File System) a highly reliable, high-throughput distributed file system.
2) Hadoop MapReduce: a parallel distributed computing frameworks offline.
3) Hadoop YARN: frame job scheduling and cluster resource management.
4) Hadoop Common: support tool module to other modules (Configuration, RPC, serialization mechanism, logging operation).

Q8: hadoop outlined the advantages of
answers:
1) High reliability: Because Hadoop assumptions computing and storage elements can fail, because it maintains multiple copies of data to work, in case of failure of the failed node can be re-distributed processing.
2) high scalability: distribution of tasks between cluster data, it can be easily extended thousands of nodes.
3) Efficiency: In MapReduce thinking, Hadoop is working in parallel to speed up the task of processing speed.
4) high fault tolerance: automatically saves multiple copies of data, and can automatically reassign tasks will fail.

Ninth title: four characteristics outlined big data
answer:
1, a large number of
2, high-speed
3, diverse
4, low density value

Question 10: Description of the concept of big data and the problem to be solved
answer:
big data (big data): refers to the collection of data can not be captured, managed and treated with conventional software tools within a certain time frame, is the need for new treatment mode in order to have more decision-making power, insight found massive, high growth rates and diverse information assets force and process optimization capabilities.
Mainly to solve, analyze massive data storage and computational problems of massive data.

The eleventh question: mapreduce disadvantages are:
The answer:
MapReduce is not good at real-time calculation, flow calculation, DAG (directed graph) is calculated.

Twelfth problem: the number of maptask mapreduce of what determines
the answer:
the number of slices to determine the number of maptask

Thirteenth question: advantage hive is
the answer:
1, the operating interface using SQL-like syntax, the ability to provide rapid development (simple, easy to use).
2, avoid to write MapReduce, reduce learning costs for developers.
. 3, Hive execution delay is relatively high, so Hive commonly used in the data analysis, less demanding real-time applications.
4, Hive advantage of big data processing, for processing data no small advantage, because Hive execution delay is relatively high.
5, Hive support for user-defined functions, users can implement your own functions according to their needs.

Fourteenth question: mapreduce programming specifications outlined in
the answer:
1, Mapper stage
(1) Mapper user-defined to inherit their parent
(2) Mapper input data is in the form of a KV
(3) Mapper in business the logic within map () method
(4) the output data Mapper is in the form KV pair
(. 5) map () method for each <K, V> called once
2, Reducer stage
custom Reducer to (1) user inherits its parent class
(2) corresponding to the type of input data Reducer output data type Mapper, i.e. KV
(. 3) is written in the business logic Reducer reduce () method
(4) Reducetask same process for each set of k < K, V> group called once reduce () method
3, Driver stage
equivalent of yarn cluster client, we used to submit the program to the entire cluster yarn, submitted a package of job-related objects operating parameters mapreduce program

Fifteenth Title: Custom hadoop bean object in order to serialize transmission frame, which class includes conditions must
answer:
1, must implement the interface Writable
2, deserialization, reflecting the need to call the constructor parameter space, so must have an empty argument constructor
3, override serialization method
4, method override deserialization
5, note that the order and sequence of identical deserialized
6, in order to display the results in a file, it is necessary to rewrite toString (), can be used "\ t" apart, to facilitate subsequent use

XVI question: how to implement a custom partitioning brief mapreduce the
answer:
1, their definition of a class that inherits Partitioner, and override getPartition method
2, set in the driver use their own definition of class partition

Seventeenth title: Description of yarn common resource scheduler
answer:
1, FIFO scheduler
2, the capacity scheduler
3, Fair Scheduler

Eighteenth title: Description of the difference between the internal and outer tables hive of
answers:
1, Hive create an internal table, will move to the path of the data warehouse data points; if you create an external table, only records where the data path, not data position to make any changes.
2, delete the table, the metadata for internal tables and data will be deleted together, and external table only remove metadata, do not delete the data.

Nineteenth title: Self-map method mapper class definition in when called once? Self reduce method reduce the definition in when called once?
The answer:
Mapper stage, map () method for each <K, V> is called once for
Reducer stage, Reducetask same process for each set of k <K, V> group called once reduce () method

Article title: enterprise-class tuning mode briefly hive of
answers:
1, open the Fetch fetch
2, open the local mode
3, Table optimization
4, to solve data skew
5, the rational use strict mode

Twenty-first question: scenario zookeeper's
answer:
service zookeeper include: offline on the server node dynamic, unified configuration management, load balancing, cluster management, etc.

The twenty-second question: HMaster functionality
answer:
HMASTER function
1. Monitoring RegionServer
2. RegionServer failover process
3. Change processing metadata
4. Distribution or transfer processing region
5. Load data in the idle time equalizer
6. Location publish metadata to the client by Zookeeper

Twenty third question: brief understanding of the zookeeper's
answer:
Zookeeper is an open source distributed, to provide coordinated services to distributed applications Apache project.
Zookeeper mode from the design point of view to understand: is a service-based distributed management framework observer design pattern, which is responsible for data storage and management of concern to everyone, and then accept the registration of the observer, once the status of these data changes, it Zookeeper He will be responsible for notifying those observers have been registered on the Zookeeper react accordingly, in order to achieve a similar cluster Master / Slave management.
Zookeeper = file system (the data may be stored on zk) + notification mechanism

Twenty-fourth question: znode zookeeper briefly described four types of nodes
answer:
1, persistent directory node
2, persistent directory node sequence number
3, the temporary directory node
4, the temporary directory node sequence number

Twenty-fifth problem: briefly zookeeper in principle listeners
answer:
1, we must first have a main () Principle
2, create Zookeeper client main thread, then it will create two threads, one for network connection communications, is responsible for monitoring a
3 by thread will connect listeners registered event is sent to Zookeeper
4, registered in the list of registered listeners Zookeeper monitor events added to the list
5, Zookeeper listening to the data or path changes, it will this will send a message to the listener tutorial
6, internal listener thread calls the process () method.

Twenty sixth title: Description of Characteristics hbase
answer:
1, mass storage
2, storage column
3, easy extension
4, high concurrency
5, sparse

Twenty seventh question: hbase briefly in the role of zookeeper of three
answers:
1, by Zoopkeeper to ensure that the cluster is running only one master, master if abnormal, will have a new master through a competitive mechanism to provide services
2, by Zoopkeeper to monitor the state of RegionServer, when there is an abnormality RegionSevrer notification logout information in the form of callback Master RegionServer (offline mechanism is done on the server using zk)
3, unified data entry by the address memory element Zoopkeeper

Twenty-eighth problem: briefly hbase read data flow
answer:
1, Client first visit zookeeper, reading region from the position meta table, and then read the data meta table. meta, he also stored user table region information
2, according to the namespace, table name and rowkey find the corresponding region information in the meta Table
3, find the region corresponding RegionServer
4, look for the corresponding region
5, starting with MemStore find data, if no, to which read BlockCache
6, BlockCache yet, then read (for read efficiency) on StoreFile
. 7, if the data is read from the inside StoreFile, not directly back to the client, but the first writing BlockCache then returned to the client

Twenty ninth question: Why design hbase of rowkey
answer:
1, so that data is evenly distributed to all the region, prevent data skew to a certain extent.
2, rowkey remember, after data corresponding facilitate removal of rowkey

Broadcast and unicast brief kafka is how to achieve message: thirty-question
answer:
Consumer Group (CG), namely consumer group, is a topic kafka used for broadcast messages (sent to all consumer) and single means broadcast (sent to any of the consumer) of. A topic may have a plurality of partition, may also correspond to a plurality of CG. topic of the message copy (not true copy, is conceptual) to all of the CG, but each partition can only send messages to a consumer in the CG. If you want a broadcast, as long as each consumer has an independent CG on it. To achieve as long as all of unicast consumer in the same CG. Consumer may also be grouped freely without the need to send multiple messages to a different topic of CG

He published 189 original articles · won praise 13 · views 10000 +

Guess you like

Origin blog.csdn.net/NewBeeMu/article/details/102772750