Python常见问题解决办法汇总

1、(unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escap错误,如:

>>>import sys
>>>sys.path.append("c:\users\administrator\desktop\py\san.py")

解决办法:在Python中\是转义符,\u表示其后是UNICODE编码,因此\user在这里会报错,在字符串前面加个r表示就可以了,即:

>>>import sys
>>>sys.path.append(r"c:\users\administrator\desktop\py\san.py")

2、用urllib.request.urlopen()打开带有中文的网址时会报错
解决办法:先用urllib.parse.quote()对链接中的中文进行处理,如下:

import urllib.request
import urllib.parse
url = "http://www.baidu.com/s?wd=" +urllib.parse.quote("中国")
resp = urllib.request.urlopen(url)
print(resp.read().decode('utf-8'))

注:只对其中的中文进行处理,如果对整个网址进行quote处理也会报错。

3、报错:OSError: [Errno 22] Invalid argument: ‘C:/Users/Administrator/Desktop/py/newpy/父亲节 | 致爱你最深 却又从不表达的人.txt’
解决办法:
a、路径转义:路径前加r即可;
b、文件名错误:文件名不能包含英文中的下列符号 \ / : * ? " < > |

4、在写入TXT文件时,某些页面总是报UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u2022’ in position 172: illegal multibyte sequence错误:
解决办法:在f = open(‘test.txt’, ‘w’, encoding=‘utf-8’) 里加上encoding='utf-8’参数。
原因:网页及python的编码都是utf-8,在写进txt时,Windows默认转码成gbk,遇到某些gbk不支持的字符就会报错。在打开文件时就声明编码方式为utf-8能避免这个错误。

5、sublime text3中用sublimeREPL打印大文件会很慢甚至卡死的解决办法:
解决办法:使用Ctrl+B编译运行(代码中不能包含input())即可,如果代码中含input()会报错EOFError: EOF when reading a line。

6、Ctrl+B运行如出现[decode error - output not utf-8]问题:
解决办法:给电脑添加环境变量,在系统变量中填写变量名PYTHONIOENCODING和变量值utf-8,如是其他编码问题,则值换成其他编码即可。
添加环境变量(win7):右键计算机 -> 属性 -> 右侧高级系统设置 -> 环境变量 -> 系统变量(S) -> 新建 -> 输入变量名和变量值 -> 多次确定即可。

7、IncompleteRead(25856 bytes read, 19 more expected)错误:
原因:这个异常表示希望从HTTP body 中读取A 字节,但实际能够读取的字节数小于A个字节。
HTTP Response 的Header 大致如下:

{'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '44668', 'content-encoding': 'gzip'}

可以判断Response body完整传输(safe transport)的要素是 content-length,也就是说实际读到的body体的长度小于content_length。
捕获异常及处理:

	for li in lie:
		try:
			li_code.append(get_html(li))				#获取详情页面code
		except Exception as e:
			print(e)
			#print(e.partial.decode('utf-8'))			#捕获异常原代码
			li_code.append(e.partial.decode('utf-8'))	#将捕获的异常页面代码添加到列表中(此处根据实际情况更改)
		continue										#继续for循环

8、TabError: inconsistent use of tabs and spaces in indentation错误:
原因:是看似空格实则没有空格引起的,修改了很多遍,发现还是空格失败。
解决办法:复制前面正常的空格到提示代码前就好了。

9、openpyxl模块border边框缺失:
引入fix_border.py文件,以下是其完整代码:

#!/usr/bin/env python
#-*-conding=utf-8-*-
from itertools import product
import types
import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl import worksheet
from openpyxl.utils import range_boundaries
from itertools import product
import regex as re                  #此为第三方模块,需pip install regex安装,regex基本兼容re模块

def patch_worksheet():
    """
    这monkeypatches工作表。删除单元格删除错误
    https://bitbucket.org/openpyxl/openpyxl/issues/365/styling-merged-cells-isnt-working
    """
    def merge_cells(self, range_string=None, start_row=None, start_column=None, end_row=None, end_column=None):
        """ 
        在单元格范围中设置merge。Range是单元格范围(例如A1:E1)
        这是monkeypatched删除单元格删除错误
        """
        if not range_string and not all((start_row, start_column, end_row, end_column)):
            msg = "You have to provide a value either for 'coordinate' or for\ 'start_row', 'start_column', 'end_row' *and* 'end_column'"
            raise ValueError(msg)
        elif not range_string:
            range_string = '%s%s:%s%s' % (get_column_letter(start_column),
                                          start_row,
                                          get_column_letter(end_column),
                                          end_row)
        elif ":" not in range_string:
            if COORD_RE.match(range_string):
                return  # 单个单元格,什么都不做
            raise ValueError("Range must be a cell range (e.g. A1:E1)")
        else:
            range_string = range_string.replace('$', '')
        if range_string not in self.merged_cells:
            self.merged_cells.add(range_string)
        #以下是由这个monkeypatch删除的:
        # min_col, min_row, max_col, max_row = range_boundaries(range_string)
        # rows = range(min_row, max_row+1)
        # cols = range(min_col, max_col+1)
        # cells = product(rows, cols)
        # all but the top-left cell are removed
        #for c in islice(cells, 1, None):
            #if c in self._cells:
                #del self._cells[c]
    #使用monkey patch
    worksheet.Worksheet.merge_cells = merge_cells
patch_worksheet()

在自己的程序中导入上面文件:

from fix_border import patch_worksheet

然后在load文件前加上patch_worksheet()
全部代码为(未验证):

ws = opx.load_workbook(filename1)
fix_border.patch_worksheet()
ws.save(filename2)

10、清除抓取的HTML特殊字符:

	wz1 = re.sub('<.+?>', "", str)				#去除str中的HTML标签
	wz2 = wz1.replace('\\r', '\r').replace('\\n', '\n').replace('\\t', '\t').replace(r'\u3000', '').replace(r'\xa0', '').replace('&nbsp;', ' ').replace('&ldquo;', '“').replace('&rdquo;', '”').replace('[', '').replace(']', '').replace("'", '').replace('&middot;', '·').replace('&mdash;', '—').replace('&bull;', '•').replace('&hellip;', '…')

HTML特殊字符编码对照表

特殊符号 命名实体 十进制编码 特殊符号 命名实体 十进制编码 特殊符号 命名实体 十进制编码
Α &Alpha; &#913; Β &Beta; &#914; Γ &Gamma; &#915;
Δ &Delta; &#916; Ε &Epsilon; &#917; Ζ &Zeta; &#918;
Η &Eta; &#919; Θ &Theta; &#920; Ι &Iota; &#921;
Κ &Kappa; &#922; Λ &Lambda; &#923; Μ &Mu; &#924;
Ν &Nu; &#925; Ξ &Xi; &#926; Ο &Omicron; &#927;
Π &Pi; &#928; Ρ &Rho; &#929; Σ &Sigma; &#931;
Τ &Tau; &#932; Υ &Upsilon; &#933; Φ &Phi; &#934;
Χ &Chi; &#935; Ψ &Psi; &#936; Ω &Omega; &#937;
α &alpha; &#945; β &beta; &#946; γ &gamma; &#947;
δ &delta; &#948; ε &epsilon; &#949; ζ &zeta; &#950;
η &eta; &#951; θ &theta; &#952; ι &iota; &#953;
κ &kappa; &#954; λ &lambda; &#955; μ &mu; &#956;
ν &nu; &#957; ξ &xi; &#958; ο &omicron; &#959;
π &pi; &#960; ρ &rho; &#961; ς &sigmaf; &#962;
σ &sigma; &#963; τ &tau; &#964; υ &upsilon; &#965;
φ &phi; &#966; χ &chi; &#967; ψ &psi; &#968;
ω &omega; &#969; ϑ &thetasym; &#977; ϒ &upsih; &#978;
ϖ &piv; &#982; &bull; &#8226; &hellip; &#8230;
&prime; &#8242; &Prime; &#8243; &oline; &#8254;
&frasl; &#8260; &weierp; &#8472; &image; &#8465;
&real; &#8476; &trade; &#8482; &alefsym; &#8501;
&larr; &#8592; &uarr; &#8593; &rarr; &#8594;
&darr; &#8595; &harr; &#8596; &crarr; &#8629;
&lArr; &#8656; &uArr; &#8657; &rArr; &#8658;
&dArr; &#8659; &hArr; &#8660; &forall; &#8704;
&part; &#8706; &exist; &#8707; &empty; &#8709;
&nabla; &#8711; &isin; &#8712; &notin; &#8713;
&ni; &#8715; &prod; &#8719; &sum; &#8722;
&minus; &#8722; &lowast; &#8727; &radic; &#8730;
&prop; &#8733; &infin; &#8734; &ang; &#8736;
&and; &#8869; &or; &#8870; &cap; &#8745;
&cup; &#8746; &int; &#8747; &there4; &#8756;
&sim; &#8764; &cong; &#8773; &asymp; &#8773;
&ne; &#8800; &equiv; &#8801; &le; &#8804;
&ge; &#8805; &sub; &#8834; &sup; &#8835;
&nsub; &#8836; &sube; &#8838; &supe; &#8839;
&oplus; &#8853; &otimes; &#8855; &perp; &#8869;
&sdot; &#8901; &lceil; &#8968; &rceil; &#8969;
&lfloor; &#8970; &rfloor; &#8971; &loz; &#9674;
&spades; &#9824; &clubs; &#9827; &hearts; &#9829;
&diams; &#9830;   &nbsp; &#160; ¡ &iexcl; &#161;
¢ &cent; &#162; £ &pound; &#163; ¤ &curren; &#164;
¥ &yen; &#165; ¦ &brvbar; &#166; § &sect; &#167;
¨ &uml; &#168; © &copy; &#169; ª &ordf; &#170;
« &laquo; &#171; ¬ &not; &#172; ­ &shy; &#173;
® &reg; &#174; ¯ &macr; &#175; ° &deg; &#176;
± &plusmn; &#177; ² &sup2; &#178; ³ &sup3; &#179;
´ &acute; &#180; µ &micro; &#181 " &quot; &#34;
< &lt; &#60; > &gt; &#62; '   &#39;

猜你喜欢

转载自blog.csdn.net/qq_38882327/article/details/89093473