1. Missing jar package: httpclient
Error:
“HiveServer2-Handler-Pool: Thread-696” java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory
Need to load commons-httpclient-3.1.jar
2. Missing jar package: eshadoop
Error reported:
FAILED: SemanticException Cannot find class ‘org.elasticsearch.hadoop.hive.EsStorageHandler’
Need to load the same version number as the ES version being used: elasticsearch-hadoop-7.6.1.jar
3. After creating the ES table in hive, it cannot be queried normally.
Error reported:
Error: java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Expected to find keystore file at [hdfs:///path/to/esh.keystore] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid file URI. (state=,code=0)
What is used here is to place the keystore on HDFS.
The attribute needs to be specified in the table creation statement: ‘es.nodes.wan.only’ = ‘true’,
Specific explanation:
Detailed information about the configuration "es.nodes.wan.only" can be found at https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html:
What this means is that through the public network, when I access ES instances on the cloud or some restricted networks, such as AWS, by declaring this configuration, the behavior of discovering other nodes will be disabled, and subsequent reads and writes will only be done through this specification. Operate on a node. By adding this attribute, you can access ES on the cloud or in a restricted network. However, because both reading and writing pass through this node, the performance will be greatly affected.