Article Directory
-
- Why Hadoop is faster than traditional technical solutions
- What are the characteristics of big data?
- What do the shell client operation commands of hdfs mean?
- What can big data do?
- What are the main functions of hdfs?
- In which file is Hadoop's trash can mechanism configured?
- What are the trash can configuration parameters?
- Command to start jobHistoryserver service process?
- What is the default port accessed by jobhistoryserver's webUI?
- What are the files that need to be configured when installing hadoop?
- When HDFS is started for the first time, which command must be formatted?
- What folders are included in the hadoop installation package directory and what are their functions?
- Hadoop feature advantages?
- What are the ways to deploy Hadoop?
- Command for network synchronization?
- In which file is the hostname set?
- Which file is used to configure IP and hostname mapping?
- Command to start HDFS NameNode?
- Start HDFS DataNode on a single node?
- Start YARN ResourceManager on a single node?
- What are the one-click startup and shutdown script commands for HDFS clusters?
- A brief overview of the difference between hadoop's combinet and partition
- What does HBase rely on to provide a message communication mechanism?
- Please describe in detail the structure of a Cell in Hbase
- The timing of compact triggering in hbase
- The difference between hbase and mysql
- The compact role of hbase
- Big data processing flow
- How to deal with Hbase downtime
Why Hadoop is faster than traditional technical solutions
1. Distributed storage
2. Distributed parallel computing
3. Horizontal expansion of nodes
4. Move program to data terminal
5. Multiple data copies
What are the characteristics of big data?
(1) Massive quantification and
large amount of data (much)
(2) Diversified
structured data, semi-structured data, and unstructured data
(3) Rapid
data growth rate
(4) High value
Mass data has high value
What do the shell client operation commands of hdfs mean?
(1) -Display ls
file and directory information
(2) mkdir
-Create a directory on hdfs, -p means that all levels of parent directories in the path will be created
(3) put
-Copy a single src or multiple srcs from the local file system to the target file System
(4)-copy get
the file to the local file system
(5) appendFile
-append a file to the end of the existing file
(6) cat
-display the file content
(7) tail
-display the last content of the file
(8) chmod
-change the file Permissions. Use -R to make changes recursively under the directory structure
(9) copyFromLocal
-copy files from the local file system to the hdfs path
(10) copyToLocal
-copy from hdfs to the local
(11) cp
-copy from one path of hdfs to another of hdfs Path
(12) mv
-move files in the hdfs directory
(13) -delete the rm
specified files. Only delete non-empty directories and files. -r delete recursively
(14) -count df
the available space information of the file system
(15) du
-display the size of all files in the directory, and display the size of this file when only one file is specified
What can big data do?
(1) Quick query of massive data
(2) Storage of massive data (large amount of data, single large file)
(3) Rapid calculation of massive data (compared with traditional tools)
(4) Real-time calculation of massive data (immediately)
( 5) Data mining (mining valuable data that has not been discovered before)
What are the main functions of hdfs?
The main function of HDFS is to store a large amount of data in a distributed manner
In which file is Hadoop's trash can mechanism configured?
core-site.xml
Configuration in file
What are the trash can configuration parameters?
fs.trash.interval
Command to start jobHistoryserver service process?
mr-jobhistory-daemon.sh start historyserver
Start
mr-jobhistory-daemon.sh stop historyserver
close
What is the default port accessed by jobhistoryserver's webUI?
The default port is 19888
What are the files that need to be configured when installing hadoop?
(1)hadoop-env.sh
(2)core-site.xml
(3)hdfs-site.xml
(4)mapred-site.xml
(5)yarn-site.xml
(6)Slaves
When HDFS is started for the first time, which command must be formatted?
bin/hdfs namenode -format或者bin/hadoop namenode –format
What folders are included in the hadoop installation package directory and what are their functions?
(1) bin
: Hadoop's most basic management scripts and the directory where the scripts are used
(2) etc
: The directory where the Hadoop configuration files are located
(3) include
: The programming library header files provided externally
(4) lib
: This directory contains the programming dynamics provided by Hadoop externally Libraries and static libraries
(5) libexec
: the directory where the shell configuration files used by each service pair are located
(6) sbin
: the directory where the Hadoop management script is located
(7) share
: the directory where the jar package compiled by each Hadoop module is located, the official example comes with
Hadoop feature advantages?
(1) Capacity expansion
(2) Low cost
(3) High efficiency
(4) Reliability
What are the ways to deploy Hadoop?
(1) Standalone mode (independent mode)
(2) Pseudo-Distributed mode
(3) Cluster mode (cluster mode)
Command for network synchronization?
ntpdate cn.pool.ntp.org
(Ntpdate address)
In which file is the hostname set?
/etc/sysconfig/network
Which file is used to configure IP and hostname mapping?
/etc/hosts
Command to start HDFS NameNode?
hadoop-daemon.sh start namenode
Start HDFS DataNode on a single node?
hadoop-daemon.sh start datanode
Start YARN ResourceManager on a single node?
yarn-daemon.sh start resourcemanager
What are the one-click startup and shutdown script commands for HDFS clusters?
start-dfs.sh
Start script stop-dfs.sh
stop script
A brief overview of the difference between hadoop's combinet and partition
Both combine and partition are functions, and the only step in the middle should be shuffle! Combine is divided into map side and reduce side. The function is to merge the key-value pairs of the same key together. It can be customized. The partition is the result of dividing each node of the map. It is mapped to different reduce according to the key. It can also be self-defined. Defined. In fact, the classification can be understood here.
What does HBase rely on to provide a message communication mechanism?
Zookeeper
Please describe in detail the structure of a Cell in Hbase
A storage unit determined by row and columns in HBase is called a cell. Cell: A {row key, column(=<family> + <label>), version}
uniquely determined cell. The data in the cell has no type, and is all stored in bytecode format.
The timing of compact triggering in hbase
1) After Memstore is flashed, judge whether it is compacted or not
2) CompactionChecker thread, polling periodically
The difference between hbase and mysql
Mysql stores data for rows, the data of the entire row is a whole, stored together
Hbase stores data for columns, the data of the entire row is a whole, stored together, which is conducive to compression and statistics
The compact role of hbase
1. Combine files
2. Clean up expired data
3. Improve the efficiency of reading and writing data
Big data processing flow
Data production--"data collection--"data storage--"demand analysis--"data preprocessing--"data calculation--"result data storage--"result data display
How to deal with Hbase downtime
Downtime is divided into HMaster downtime and HRegisoner downtime. If HRegisoner downtime, HMaster will redistribute the regions it manages to other active RegionServers. Since data and logs are persistent in HDFS, this operation will not cause data lost. Therefore, the consistency and security of the data are guaranteed. If HMaster is down, HMaster has no single point of problem. Multiple HMasters can be started in HBase, and there is always one Master running through Zookeeper's Master Election mechanism. That is, ZooKeeper will guarantee that there will always be an HMaster to provide services externally