1、(unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escap错误，如：

>>>import sys
>>>sys.path.append("c:\users\administrator\desktop\py\san.py")

解决办法：在Python中\是转义符，\u表示其后是UNICODE编码，因此\user在这里会报错，在字符串前面加个r表示就可以了，即：

>>>import sys
>>>sys.path.append(r"c:\users\administrator\desktop\py\san.py")

2、用urllib.request.urlopen()打开带有中文的网址时会报错
解决办法：先用urllib.parse.quote()对链接中的中文进行处理，如下：

import urllib.request
import urllib.parse
url = "http://www.baidu.com/s?wd=" +urllib.parse.quote("中国")
resp = urllib.request.urlopen(url)
print(resp.read().decode('utf-8'))

注：只对其中的中文进行处理，如果对整个网址进行quote处理也会报错。

3、报错：OSError: [Errno 22] Invalid argument: ‘C:/Users/Administrator/Desktop/py/newpy/父亲节 | 致爱你最深却又从不表达的人.txt’
解决办法：
a、路径转义：路径前加r即可；
b、文件名错误：文件名不能包含英文中的下列符号 \ / : * ? " < > |

4、在写入TXT文件时，某些页面总是报UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u2022’ in position 172: illegal multibyte sequence错误：
解决办法：在f = open(‘test.txt’, ‘w’, encoding=‘utf-8’) 里加上encoding='utf-8’参数。
原因：网页及python的编码都是utf-8，在写进txt时，Windows默认转码成gbk，遇到某些gbk不支持的字符就会报错。在打开文件时就声明编码方式为utf-8能避免这个错误。

5、sublime text3中用sublimeREPL打印大文件会很慢甚至卡死的解决办法：
解决办法：使用Ctrl+B编译运行（代码中不能包含input()）即可，如果代码中含input()会报错EOFError: EOF when reading a line。

6、Ctrl+B运行如出现[decode error - output not utf-8]问题：
解决办法：给电脑添加环境变量，在系统变量中填写变量名PYTHONIOENCODING和变量值utf-8，如是其他编码问题，则值换成其他编码即可。
添加环境变量（win7）：右键计算机 -> 属性 -> 右侧高级系统设置 -> 环境变量 -> 系统变量(S) -> 新建 -> 输入变量名和变量值 -> 多次确定即可。

7、IncompleteRead(25856 bytes read, 19 more expected)错误：
原因：这个异常表示希望从HTTP body 中读取A 字节，但实际能够读取的字节数小于A个字节。
HTTP Response 的Header 大致如下:

{'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '44668', 'content-encoding': 'gzip'}

可以判断Response body完整传输(safe transport)的要素是 content-length，也就是说实际读到的body体的长度小于content_length。
捕获异常及处理：

	for li in lie:
		try:
			li_code.append(get_html(li))				#获取详情页面code
		except Exception as e:
			print(e)
			#print(e.partial.decode('utf-8'))			#捕获异常原代码
			li_code.append(e.partial.decode('utf-8'))	#将捕获的异常页面代码添加到列表中（此处根据实际情况更改）
		continue										#继续for循环

8、TabError: inconsistent use of tabs and spaces in indentation错误：
原因：是看似空格实则没有空格引起的，修改了很多遍，发现还是空格失败。
解决办法：复制前面正常的空格到提示代码前就好了。

9、openpyxl模块border边框缺失：
引入fix_border.py文件，以下是其完整代码：

#!/usr/bin/env python
#-*-conding=utf-8-*-
from itertools import product
import types
import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl import worksheet
from openpyxl.utils import range_boundaries
from itertools import product
import regex as re                  #此为第三方模块，需pip install regex安装，regex基本兼容re模块

def patch_worksheet():
    """
    这monkeypatches工作表。删除单元格删除错误
    https://bitbucket.org/openpyxl/openpyxl/issues/365/styling-merged-cells-isnt-working
    """
    def merge_cells(self, range_string=None, start_row=None, start_column=None, end_row=None, end_column=None):
        """ 
        在单元格范围中设置merge。Range是单元格范围(例如A1:E1)
        这是monkeypatched删除单元格删除错误
        """
        if not range_string and not all((start_row, start_column, end_row, end_column)):
            msg = "You have to provide a value either for 'coordinate' or for\ 'start_row', 'start_column', 'end_row' *and* 'end_column'"
            raise ValueError(msg)
        elif not range_string:
            range_string = '%s%s:%s%s' % (get_column_letter(start_column),
                                          start_row,
                                          get_column_letter(end_column),
                                          end_row)
        elif ":" not in range_string:
            if COORD_RE.match(range_string):
                return  # 单个单元格，什么都不做
            raise ValueError("Range must be a cell range (e.g. A1:E1)")
        else:
            range_string = range_string.replace('$', '')
        if range_string not in self.merged_cells:
            self.merged_cells.add(range_string)
        #以下是由这个monkeypatch删除的:
        # min_col, min_row, max_col, max_row = range_boundaries(range_string)
        # rows = range(min_row, max_row+1)
        # cols = range(min_col, max_col+1)
        # cells = product(rows, cols)
        # all but the top-left cell are removed
        #for c in islice(cells, 1, None):
            #if c in self._cells:
                #del self._cells[c]
    #使用monkey patch
    worksheet.Worksheet.merge_cells = merge_cells
patch_worksheet()

在自己的程序中导入上面文件：

from fix_border import patch_worksheet

然后在load文件前加上patch_worksheet()
全部代码为（未验证）：

ws = opx.load_workbook(filename1)
fix_border.patch_worksheet()
ws.save(filename2)

10、清除抓取的HTML特殊字符：

	wz1 = re.sub('<.+?>', "", str)				#去除str中的HTML标签
	wz2 = wz1.replace('\\r', '\r').replace('\\n', '\n').replace('\\t', '\t').replace(r'\u3000', '').replace(r'\xa0', '').replace('&nbsp;', ' ').replace('&ldquo;', '“').replace('&rdquo;', '”').replace('[', '').replace(']', '').replace("'", '').replace('&middot;', '·').replace('&mdash;', '—').replace('&bull;', '•').replace('&hellip;', '…')

HTML特殊字符编码对照表

特殊符号	命名实体	十进制编码	特殊符号	命名实体	十进制编码	特殊符号	命名实体	十进制编码
Α	Α	Α	Β	Β	Β	Γ	Γ	Γ
Δ	Δ	Δ	Ε	Ε	Ε	Ζ	Ζ	Ζ
Η	Η	Η	Θ	Θ	Θ	Ι	Ι	Ι
Κ	Κ	Κ	Λ	Λ	Λ	Μ	Μ	Μ
Ν	Ν	Ν	Ξ	Ξ	Ξ	Ο	Ο	Ο
Π	Π	Π	Ρ	Ρ	Ρ	Σ	Σ	Σ
Τ	Τ	Τ	Υ	Υ	Υ	Φ	Φ	Φ
Χ	Χ	Χ	Ψ	Ψ	Ψ	Ω	Ω	Ω
α	α	α	β	β	β	γ	γ	γ
δ	δ	δ	ε	ε	ε	ζ	ζ	ζ
η	η	η	θ	θ	θ	ι	ι	ι
κ	κ	κ	λ	λ	λ	μ	μ	μ
ν	ν	ν	ξ	ξ	ξ	ο	ο	ο
π	π	π	ρ	ρ	ρ	ς	&sigmaf;	ς
σ	σ	σ	τ	τ	τ	υ	υ	υ
φ	φ	φ	χ	χ	χ	ψ	ψ	ψ
ω	ω	ω	ϑ	&thetasym;	ϑ	ϒ	&upsih;	ϒ
ϖ	ϖ	ϖ	•	•	•	…	…	…
′	′	′	″	″	″	‾	&oline;	‾
⁄	&frasl;	⁄	℘	&weierp;	℘	ℑ	&image;	ℑ
ℜ	&real;	ℜ	™	™	™	ℵ	&alefsym;	ℵ
←	←	←	↑	↑	↑	→	→	→
↓	↓	↓	↔	↔	↔	↵	&crarr;	↵
⇐	⇐	⇐	⇑	&uArr;	⇑	⇒	⇒	⇒
⇓	&dArr;	⇓	⇔	⇔	⇔	∀	∀	∀
∂	∂	∂	∃	∃	∃	∅	∅	∅
∇	∇	∇	∈	∈	∈	∉	∉	∉
∋	&ni;	∋	∏	∏	∏	∑	∑	−
−	−	−	∗	&lowast;	∗	√	√	√
∝	&prop;	∝	∞	∞	∞	∠	&ang;	∠
∧	&and;	⊥	∨	&or;	⊦	∩	∩	∩
∪	∪	∪	∫	∫	∫	∴	&there4;	∴
∼	&sim;	∼	≅	&cong;	≅	≈	≈	≅
≠	≠	≠	≡	&equiv;	≡	≤	≤	≤
≥	≥	≥	⊂	⊂	⊂	⊃	⊃	⊃
⊄	&nsub;	⊄	⊆	&sube;	⊆	⊇	&supe;	⊇
⊕	&oplus;	⊕	⊗	&otimes;	⊗	⊥	&perp;	⊥
⋅	⋅	⋅	⌈	&lceil;	⌈	⌉	&rceil;	⌉
⌊	&lfloor;	⌊	⌋	&rfloor;	⌋	◊	&loz;	◊
♠	&spades;	♠	♣	&clubs;	♣	♥	&hearts;	♥
♦	&diams;	♦				¡	¡	¡
¢	¢	¢	£	£	£	¤	¤	¤
¥	¥	¥	¦	¦	¦	§	§	§
¨	¨	¨	©	©	©	ª	ª	ª
«	«	«	¬	¬	¬
®	®	®	¯	¯	¯	°	°	°
±	±	±	²	²	²	³	³	³
´	´	´	µ	µ	&#181	"	"	"
<	<	<	>	>	>	'		'

Python常见问题解决办法汇总

HTML特殊字符编码对照表

猜你喜欢