版权声明:欢迎转载。转载请注明地址:https://blog.csdn.net/weixin_32820767 https://blog.csdn.net/weixin_32820767/article/details/82287671
问题
使用 python 的 pandas 模块报错。
代码:
file_path = os.path.join(self.ROOT, self.file)
data = pd.read_csv(file_path)
报错:
File "name.py", line 24, in name
data = pd.read_csv(file_path)
File "/home/XXX/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/XXX/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 455, in _read
data = parser.read(nrows)
File "/home/XXX/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1069, in read
ret = self._engine.read(nrows)
File "/home/XXX/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1839, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 978, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2208, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 5
答案
找到一个解答:Python Pandas Error tokenizing data
指出,应该使用:
data = pd.read_csv('file1.csv', error_bad_lines=False)
原因
1 官方文档给出的参数解释:
error_bad_lines : boolean, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned
2 当这种情况时会报错:
Handling “bad” lines
Some files may have malformed lines with too few fields or too many. Lines with too few fields will have NA values filled in the trailing fields. Lines with too many will cause an error by default:
In [27]: data = 'a,b,c\n1,2,3\n4,5,6,7\n8,9,10'
In [28]: pd.read_csv(StringIO(data))
---------------------------------------------------------------------------
CParserError Traceback (most recent call last)
CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
You can elect to skip bad lines:
In [29]: pd.read_csv(StringIO(data), error_bad_lines=False)
Skipping line 3: expected 3 fields, saw 4
Out[29]:
a b c
0 1 2 3
1 8 9 10