impala debug pre-deployment

one. Prepare LLVM
LLVM: copy from nobida143 scp -rq nobida143:/opt/llvm-3.3 /opt/
1. Add LLVM_HOME vim ~/.bashrc Add a line to export LLVM_HOME=/opt/llvm-3.3
2. Prepare BOOST
2. BOOST : from nobida143 copy scp -rq nobida143:/usr/local/lib/boost /usr/local/lib/
3. vim /etc/ld.so.conf.d/boost-x86_64.conf add a line to /usr/local/lib/ boost
4. ldconfig
three. Prepare Maven
IV. Impala compilation (hadoop, hive use thirdparty in impala)
5. cd
/home/data2/wangyh/Impala-cdh5-2.0_5.2.0/ 6. Modify impala-config.sh
export HIVE_HOME=$IMPALA_HOME/ thirdparty/hive-${IMPALA_HIVE_VERSION}
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HADOOP_HOME=$IMPALA_HOME/thirdparty/hadoop-${IMPALA_HADOOP_VERSION}
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
7. source bin/impala-config.sh
8. ./build-all.sh –notests  –noclean
9. 修改thirdparty/hadoop-2.5.0-cdh5.2.0/etc/hadoop core-site.xml  hdfs-site.xml  slaves文件(红色标红的是需要修改的)
core-site.xml 如下:
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://nobida145:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>10080</value>
  </property>
  <property>
    <name>fs.trash.checkpoint.interval</name>
    <value>10080</value>
  </property>
  <property>
    <name>io.native.lib.available</name>
    <value>true</value>
  </property>
</configuration>


hdfs-site.xml如下:
<configuration>
<property>
  <name>fs.checkpoint.dir</name>
  <value>/home/data3/secondarynamenode</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>/home/data1/hadoop-cdh5.2-nn</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>   <value>/home/data6/hdfs-data,/home/data7/hdfs-data,/home/data8/hdfs-data,/home/data9/hdfs-data</value>
</property>
<property>
  <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
  <value>true</value>
</property>
<property>
   <name>dfs.client.use.legacy.blockreader.local</name>
   <value>false</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.support.append</name>
  <value>true</value>
</property>
<property>
  <name>dfs.block.local-path-access.user</name>
  <value>root</value>
</property>
<property>
  <name>dfs.client.read.shortcircuit</name>
  <value>true</value>
</property>

<property>
  <name>dfs.domain.socket.path</name>
  <value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
  <name>dfs.client.file-block-storage-locations.timeout</name>
  <value>10000 </value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/data1/hdfs-data</value>
</property>
<property>
  <name>dfs. webhdfs.enabled</name>
  <value>true</value>
</property>
</configuration>

<!-- fs.default.name - this is a URI (including protocol, hostname) describing the NameNode in the cluster , port number), each machine in the cluster needs to know the address of the NameNode. DataNode nodes are first registered with the NameNode so that their data can be used. A stand-alone client program interacts with the DataNode through this URI to obtain the file's block list. -->
<!-- dfs.data.dir - This is the local filesystem path where the DataNode is designated to store data. This path on the DataNode node does not have to be exactly the same, because the environment of each machine is likely to be different. But it would make the job easier if this path was configured uniformly on each machine. By default, its value is hadoop.tmp.dir, this path can only be used for testing purposes, because it is likely to lose some data. So, this value is best to be overridden.
dfs.name.dir - This is the local system path where the NameNode stores information about the hadoop file system. This value is only valid for NameNode, DataNode does not need to use it. The warnings above for the /temp type apply here as well. In practical applications, it is best to be overridden. -->
<!—hadoop.tmp.dir is the basic configuration that the hadoop file system depends on, and many paths depend on it. If the storage location of namenode and datanode is not configured in hdfs-site.xml, it is placed in this path by default -->

slaves are as follows:
nobida145
10. Modify hive- under thirdparty/ hive-0.13.1-cdh5.2.0/conf site.xml
hive-site.xml is as follows: (see attachment hive-site.xml)

11. Modify bin/set-classpath.sh
CLASSPATH=\
$IMPALA_HOME/conf:\
$IMPALA_HOME/fe/src/test/resources: \
$IMPALA_HOME/fe/target/classes:\
$IMPALA_HOME/fe/target/dependency:\
$IMPALA_HOME/fe/target/test-classes:\
${HIVE_HOME}/lib/datanucleus-api-jdo-3.2.1.jar:\
${HIVE_HOME}/lib/datanucleus -core-3.2.2.jar:\
${HIVE_HOME}/lib/datanucleus-rdbms-3.2.1.jar:
Add a line, $IMPALA_HOME/conf:\, and create a conf folder under $IMPALA_HOME, add core- The three files site.xml hdfs-site.xml hive-site.xml are tested in the conf directory

12. Hadoop namenode –format and start dfs, hive
13. bin/start-impala-cluster.py -s 1 Start impala
five. Encountered errors
1. Impala cannot read and write hdfs, the reason is that set-classpathsh in bin/set-classpath.sh has added conf and forgot to write:
2. Datanode or namenode cannot be started, so put the folder corresponding to hadoop.tmp.dir Empty, delete the folder corresponding to dfs.datanode.data.dir (specific analysis according to the specific situation of the log)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326440316&siteId=291194637