What HDFS storage is the memory?

2019-06-11

Keywords: Hadoop memory storage, HDFS storage structure, LAZY PERSIST


 

Our HDFS ah, its location is a file system that is used to store files industry. That HDFS There are two ways to store files

1, memory storage

2, heterogeneous storage

This article, we take a simple talk of HDFS 'memory storage. "

 

First, let's look in the end what is "memory storage"?

 

Why, yes Well, of course, it is to use memory to store data in a manner slightly! Yes, in HDFS, "memory storage" is what we often hear "LAZY_PERSIST". We can set when you create a file on HDFS its storage mode "LAZY_PERSIST" mode, this way, all the subsequent data in this file will be added directly to the corresponding data stored on the memory node. The benefits of using memory as a storage medium data presumably do not need me to say, but at the same time to save data in memory is very bad headache ah. Memory capacity is small, power-down data loss count them out, no matter which question related personnel can make the scalp burst. In order to avoid these risks but can not completely give up memory storage, after all its benefits also is too tempting. then what should we do? As much as possible to solve its shortcomings chant! The most appropriate solution is "asynchronous persistence" of the.

 

What is "asynchronous persistence"? It means that the file data for the use of memory storage mode, and then set up another thread in the background of some old data is persisted to disk up, when the system is normal to stop taking, may only need a short time will be able to the remaining data in the memory of all persisted to disk to go, and for the server is powered down, the situation down, etc., may be just a small part of the data loss only, engineers are able to tolerate a small fraction of cases of data loss . And this "asynchronous persistence" process is transparent for the end user. It is due to this delay landing data to disk mechanism, which makes the memory storage to get a "LAZY_PERSIST" English name.

 

We understand that in front of the basic concepts of memory storage later, in the daily development process but also how to use the memory to store model?

 

In fact, there are two ways to set up using the memory storage mode:

1, the command line settings

2, Java code sets

 

The first command line settings kinds manner, the following command can be used

hdfs storagepolicies -setStoragePolicy -path <path> -policy <policy>

But this must pay attention only to the directory settings. The following is a memory storage mode using the command set of examples

[chorm@m254 ~]$ hdfs storagepolicies -setStoragePolicy -path lemontea2 -policy LAZY_PERSIST
Set storage policy LAZY_PERSIST on lemontea2
[chorm@m254 ~]$ 

The above command indicates I create a directory lemontea2 directory and set this directory to LAZY_PERSIST mode, follow all created in this directory files are stored in a memory mode to store.

 

The first two kinds of Java code to set the way also occur when the file is created. We can find such a method in the abstract class org.apache.hadoop.fs.FileSystem

 1   public FSDataOutputStream create(Path f,
 2       FsPermission permission,
 3       EnumSet<CreateFlag> flags,
 4       int bufferSize,
 5       short replication,
 6       long blockSize,
 7       Progressable progress) throws IOException {
 8     return create(f, permission, flags, bufferSize, replication,
 9         blockSize, progress, null);
10   }

Flags parameter for this method is to set the storage mode, the flags of all modes are supported by the following

CreateFlag specifies the file create semantic. Users can combine flags like: 
EnumSet.of(CreateFlag.CREATE, CreateFlag.APPEND) 

Use the CreateFlag as follows: 
1. CREATE - to create a file if it does not exist,else throw FileAlreadyExists.
2. APPEND - to append to a file if it exists,else throw FileNotFoundException.
3. OVERWRITE - to truncate a file if it exists,else throw FileNotFoundException.
4. CREATE|APPEND - to create a file if it does not exist,else append to an existing file.
5. CREATE|OVERWRITE - to create a file if it does not exist,else overwrite an existing file.
6. SYNC_BLOCK - to force closed blocks to the disk device.In addition Syncable.hsync() should be called after each write,if true synchronous behavior is required.
7. LAZY_PERSIST - Create the block on transient storage (RAM) ifavailable.
8. APPEND_NEWBLOCK - Append data to a new block instead of end of the lastpartial block.

Using this method we create a file in Java code to create can be set up.

 

That, in memory storage mode DataNode is how to operate it?

 

There are three common roles involved in memory storage mode "daily operation and maintenance" in the process of going in DataNode. One is RamDiskReplicaLruTracker, it is a great housekeeper, responsible for collating all the data throughout the information in memory. Another is LazyWriter, it is a supervisor, its task is to continue to pull the data from the block list, it will be a good throw to third role will persist data to disk to go. The last role is RamDiskAsyncLazyPersistService, and it is essentially a thread pool, but will be responsible for data persistence to disk to go.

 

Finally, how can we have to use in the process of developing the memory storage model?

 

We know that the memory storage mode is the data stored on the memory, but our HDFS data files that white is still on the local file system in our physical machines. HDFS data can not really directly after receiving the cache in the computer memory. Is there any way that allows HDFS doing this to write data to the local file system, and in fact still saved in memory methods? The answer is Linux virtual hard disk. The virtual hard disk memory is actually mapped to the hard disk, like accessing the hard disk to access the same memory. That set up Linux on a virtual hard disk is unknown here said, in short, is the / dev / shm tmpfs to mount node pattern to a directory, then this directory to fill hdfs-site.xml go. In general, in order to use memory storage mode, the following three conditions must be met:

1, to ensure that the local file system virtual hard disks exist, it will fill in the hdfs-site.xml in dfs.datanode.data.dir configuration items, separated by commas. In front of the virtual hard disk to add an identifier [RAM_DISK].

<property>

  <name>dfs.datanode.data.dir</name>

  <value> / var / data / 1, / var / data / 2, [RAM_DISK] / mnt / mytmpfs / 0 </ value>

</property>

2, to ensure that HDFS heterogeneous storage strategy is open, that is, configuration items dfs.storage.policy.enabled

3, to ensure the correctness of memory values, i.e., the configuration item dfs.datanode.max.locked.memory

 

On the sauce ~ 

 


 

Guess you like

Origin www.cnblogs.com/chorm590/p/11003159.html