No module named'col' when Pysaprk sql is running

Foreword:

  • Alas, unfortunately, I lay in a pit again. I have been confused here for a long time, but I have learned a lot from eating and lying in the pit. You will naturally grow up. Not to mention, record this pit here, it is also good to give The friends at the back point to the name:
  • That's it. I use Spark's Struct Streaming here. It's all tired. There is no tutorial at all. The official website's documentation only talks about some simple introductory usage, but it can be described as difficult in actual development. It’s too hard to find a solution...

Theme:

  • My guide package code is as follows:
from pyspark.sql.functions import col
  • Annoying red letters appeared when running in the editor:

  • No module named ‘col’

  • What? I think it's playing me

  • So I searched for information on the Internet for a long time, but I didn't find the problem.I finally found the answer on StackOveflow.Maybe this question is too naive to find it.

  • In fact, the reason for this error is not the problem of the source code package, but the problem of the code editor. They cannot pass when they are detected and compiled, so an error will be reported.

  • Reason for appearance:
    look at the example first

# =====================my_module.py==========================
# 创建一个函数命名为func
globals()["func"] = lambda x: print(x)
# 从全局变量当中找出上面定义的这个函数相关的元素
__all__ = [x for x in globals() if x.startswith("func")]
#===========================end==============================

# =======================test.py=============================
# 我们再来导入前面定义的这个包中的函数
from my_module import func
func("test")
# 如果你是在编辑器中执行的,你会发现不能通过编译检查,会说找不到func
# 这是因为只依赖于静态代码分析的工具可能无法识别已定义的功能,所以就造成了编译不能通过
  • There are three solutions:
    1. If you use VS for programming, you can solve it by modifying the value of python.linting.pylintArgs
"python.linting.pylintArgs": [
        "--generated-members=pyspark.*",
        "--extension-pkg-whitelist=pyspark",
        "--ignored-modules=pyspark.sql.functions"
    ]
  1. Install the python package pyspark-stubs, its role is to improve some static package prompt detection problems.
    Note that "xxx" should be changed to your own PySpark version number
pip install pyspark-stubs==x.x.x
  1. In fact, this method is the simplest, in fact, it is to bypass the IDE check, but this is feasible
from pyspark.sql.functions as f
f.col("values")

Words to motivate yourself:
I am still too young, and I have to go to more pits. Come on,
never give up!

Guess you like

Origin blog.csdn.net/qq_42359956/article/details/105658763