CDH中Notebook的一些问题解决

版权声明:本文为博主原创文章,出处为 http://blog.csdn.net/silentwolfyh https://blog.csdn.net/silentwolfyh/article/details/83383119

目录:

1、重要事项说明

2、Notebook的配置

3、livy的配置

4、livy的版本问题

5、spark的版本问题

6、页面展示

7、参考文章



1、重要事项说明

1、cdh5.14默认的python版本为2.6.6, 一定要升级到2.7,方法如下:https://blog.csdn.net/silentwolfyh/article/details/82839713

2、livy的livy-server-0.2.0和livy-server-0.3.0版本不行,需要升级到 livy-0.5.0-incubating-bin版本.
下载路径:http://livy.incubator.apache.org/download/
在这里插入图片描述

3、spark2.10版本会和 livy-0.5.0-incubating-bin有冲突,所以要换成spark1.6的版本.

4、环境变量尽量不要写到" ~/.bash_profile "中, 要写到 " /etc/profile ", 这个问题很严重. 我同事在hdfs用户和root用户的变量有很多问题,后来我统一了.

5、livy启动要使用hdfs的用户,命令如下:

nohup ./livy-server &

在这里插入图片描述

2、Notebook的配置

CDH的hue的 hue_safety_valve.ini 配置如下:

[desktop]
app_blacklist=

[spark]
server_url=http://IP:8998/

languages='[{"name": "Scala", "type": "scala"},{"name": "PySpark", "type": "pyspark"},{"name": "Python", "type": "python"},{"name": "Impala SQL", "type": "impala"},{"name": "Hive SQL", "type": "hive"},{"name": "Text", "type": "text"}]'

[notebook]
show_notebooks=true
enable_batch_execute=true
enable_query_builder=true
enable_query_scheduling=false
[[interpreters]]
[[[hive]]]
      # The name of the snippet.
      name=Hive
      # The backend connection to use to communicate with the server.
      interface=hiveserver2

[[[impala]]]
      name=Impala
      interface=hiveserver2

[[[spark]]]
      name=Scala
      interface=livy

[[[pyspark]]]
      name=PySpark
      interface=livy

3、livy的配置

livy-env.sh的配置如下,在结尾添加如下:

export SPARK_HOME=/export/servers/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/spark
export JAVA_HOME=/export/servers/jdk1.7.0_71
export PYSPARK_PYTHON=/usr/bin/python

livy.conf的配置如下,在结尾添加如下:

livy.repl.enableHiveContext = true
livy.impersonation.enabled = true
livy.server.session.timeout = 1h
livy.server.session.factory = yarn/local

4、livy的版本问题

错误定位:
在livy的livy-server启动脚本打印出来的日志.

ERROR repl.PythonInterpreter: Process has died with 1

解决办法:
换成: livy-0.5.0-incubating-bin.zip的版本

5、spark的版本问题

错误定位:

解决办法:
换成:spark1.6.0

18/10/24 17:28:56 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0.cloudera3
      /_/

Using Python version 2.7.10 (default, Oct 23 2018 13:12:58)
SparkSession available as 'spark'.
>>>
>>> from pyspark.sql import HiveContext
>>> sqlContext.sql("select * from dwd.dwd_t1_branch_lt_a limit 100").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/pyspark/sql/context.py", line 384, in sql
    return self.sparkSession.sql(sqlQuery)
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/pyspark/sql/session.py", line 545, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o47.sql.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.hive.orc.DefaultSource could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232)
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:575)
	at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$8$$anonfun$9.apply(HiveMetastoreCatalog.scala:293)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$8$$anonfun$9.apply(HiveMetastoreCatalog.scala:283)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$8.apply(HiveMetastoreCatalog.scala:283)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$8.apply(HiveMetastoreCatalog.scala:275)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:69)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToLogicalRelation(HiveMetastoreCatalog.scala:275)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$.org$apache$spark$sql$hive$HiveMetastoreCatalog$ParquetConversions$$convertToParquetRelation(HiveMetastoreCatalog.scala:371)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$$anonfun$apply$1.applyOrElse(HiveMetastoreCatalog.scala:388)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$$anonfun$apply$1.applyOrElse(HiveMetastoreCatalog.scala:379)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:290)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:290)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:289)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:287)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$.apply(HiveMetastoreCatalog.scala:379)
	at org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$.apply(HiveMetastoreCatalog.scala:358)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
	at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
	at scala.collection.immutable.List.foldLeft(List.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:600)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
  Location:
    org/apache/spark/sql/hive/orc/DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;[Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Map;)Lorg/apache/spark/sql/sources/HadoopFsRelation; @35: areturn
  Reason:
    Type 'org/apache/spark/sql/hive/orc/OrcRelation' (current frame, stack[0]) is not assignable to 'org/apache/spark/sql/sources/HadoopFsRelation' (from method signature)
  Current Frame:
    bci: @35
    flags: { }
    locals: { 'org/apache/spark/sql/hive/orc/DefaultSource', 'org/apache/spark/sql/SQLContext', '[Ljava/lang/String;', 'scala/Option', 'scala/Option', 'scala/collection/immutable/Map' }
    stack: { 'org/apache/spark/sql/hive/orc/OrcRelation' }
  Bytecode:
    0x0000000: b200 1c2b c100 1ebb 000e 592a b700 22b6
    0x0000010: 0026 bb00 2859 2c2d b200 2d19 0419 052b
    0x0000020: b700 30b0

	at java.lang.Class.getDeclaredConstructors0(Native Method)
	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
	at java.lang.Class.getConstructor0(Class.java:3075)
	at java.lang.Class.newInstance(Class.java:412)
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
	... 72 more

>>> ^C
Traceback (most recent call last):
  File "/export/servers/cloudera/parcels/SPARK2-2.1.0.cloudera3-1.cdh5.13.3.p0.569822/lib/spark2/python/pyspark/context.py", line 240, in signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt
>>>

6、页面展示

在这里插入图片描述

7、参考文章

Run Hue Spark Notebook on Cloudera

https://blogs.msdn.microsoft.com/pliu/2016/06/18/run-hue-spark-notebook-on-cloudera/?replytocom=65#respond

https://blogs.msdn.microsoft.com/pliu/2016/06/19/run-jupyter-notebook-on-cloudera/

https://stackoverflow.com/questions/42160297/livy-pyspark-python-session-error-in-jypyter-with-spark-magic-error-repl-pytho

https://stackoverflow.com/questions/52297013/when-trying-to-use-pyspark-with-livy-i-get-pyspark-gateway-secret-error

猜你喜欢

转载自blog.csdn.net/silentwolfyh/article/details/83383119