hadoop2.7.3 win10 deployment

When configuring hadoop under win10, the installation path of jdk must not contain spaces. That is, it cannot be installed under program files

In addition, some cmd commands in hadoop cannot be used and need to be downloaded again. After downloading, you can overwrite all the bin files.

Download address: http://download.csdn.NET/detail/kokjuis/9706480

 

Refer to http://blog.csdn.net/kokjuis/article/details/53537029 for very detailed information.

----------------

 

RAID: redundant array of indenpendent disks, redundant array of independent disks.

The throughput of ordinary disks is 100-150m/s

RAID-0 consists of 2 hard disks, which can provide throughput. The two disks are connected in series, and the data is scattered and stored. The disadvantage is that one piece of data is considered bad, and the other piece of data cannot be read.

RAID-1 consists of 2 hard drives, which are mirrored in real time. Two hard disks record the same data, the disadvantage is that the disk is wasted more.

RAID-5 consists of 3 hard disks, which are also stored in a scatter, the original data is stored on one hard disk, and the other hard disk, and the parity data is stored on the remaining one hard disk. The storage of the three parts of each data on the three hard disks is random, and one hard disk of the original data can be recovered with the parity data and the other hard disk data.

RAID-10 consists of 4 hard disks, RAID-1 is used first, and RAID-0 is used.

It is recommended to use RAID for the NameNode node, but the DataNode does not need it, and the replica is already stored on the incapable node.

The NameNode only records the addition and deletion of metadata, and the NameNode stores all paths. Open the file, delete the file, and rename the file. The operations on the directory will be recorded in the log, and the operation of the data will not be logged. When the DataNode fails, it is responsible for creating more replicas.

The NameNode maintains two tables, the first namespace (path) --> block (hard disk storage). The second is to maintain block-->datanode (host) will not be stored on the hard disk (memory storage), and each time nanonode has to rebuild the relationship of block datanode. When the datanode starts, it will inform the namdenode of its own situation.

Namenode will periodically want to receive data nodes (all data blocks), and namenode is responsible for replica creation.

Hadoop's copy storage is across racks, in order to avoid 1 in one rack, 2, 3 in another and racks, in order to avoid 1 power outage, affecting 2.

block refers to the unit of hadoop file storage, v1 is 64m, v2 is 128m

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326481041&siteId=291194637