Self-built Hadoop data migration to the cloud Ali EMR

Author: Yun Kui, even rut

Best Practices Overview

Scenarios

IDC customers in a public cloud environment or self Hadoop cluster, the data set stored in HDFS file system for data analysis tasks. But the long-term data can not be saved due to space limitations self HDFS, Hadoop clusters or customers on cloud migration needs. Practice of the present embodiment provides the best practices in the following scene:

Based IPSec VPN tunnels + DistCp (Hadoop native tool) to migrate data to the cloud Ali EMR cluster, the target storage including HDFS, OSS and Ali Ali cloud cloud EMR of Jindo

Technology Architecture

Practice of the present embodiment based on the preparation procedure shown below technical architecture and main processes:
image.png

Advantages

  • Safety
    data secure transmission mode based IPSec VPN / leased line.
  • Low Cost
    is created in the types of EMR Ali cloud Hadoop clusters and self Hadoop cluster has certain cost advantages compared to the same time Ali cloud EMR can use OSS as the underlying storage space and further reduce costs.

Before this article, you need to complete the following


Guess you like

Origin yq.aliyun.com/articles/742132