[pandas] read_csv()

pandas.read_csv parameters finishing
 
Reading CSV (Comma Separated) file to DataFrame
Also supports the portion of the file to import and select iterative
parameter:
filepath_or_buffer   : str,pathlib。str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
Can be a URL, available URL types: http, ftp, s3 and files. For multi-file being prepared
Local file reading examples :: //localhost/path/to/table.csv
 
sep   : str, default ','
Specify the delimiter. If you do not specify parameters, it will try to use commas. Delimiter and not longer than one character '\ s +', the parser uses the python. And ignore the data in comma. Examples of the regular expression: '\ r \ t'
 
delimiter   : str, default None
Delimiter, the delimiter alternatively (if this parameter is specified, then the parameter sep failure)
 
delim_whitespace   : boolean, default False.
Designated space (e.g., '' or '') is used as a separator, is equivalent to setting sep = '\ s +'. If this parameter is set then the delimiter argument fails to Ture.
In the new version 0.18.1 Support
 
header   : int or list of ints, default 'infer'
Specified number of rows used as the column name, start number of rows of data. If the file name is not listed, it defaults to 0, otherwise it is set to None. If you explicitly set header = 0 will replace the original column name exists. a header parameter may be, for example, list: [0,1,3], this document shows a list of these rows as column headings (means a plurality of titles for each column), intervening rows are ignored (e.g. the present embodiment 2; in this case the data rows are 2,4 header appears as a multi-stage, the third line will be discarded, dataframe data starts on line 5).
Note: If skip_blank_lines = True parameter is ignored then the header comment lines and blank lines, so the first row header = 0 represents the first row of data rather than a file.
 
names   : array-like, default None
The results list for the column name, if the data file is not a column header row, you need to perform header = None. The default list can not be duplicated, unless set parameters mangle_dupe_cols = True.
 
index_col   : int or sequence or False, default None
Used as the row number or column index column name, if there is given a sequence of a plurality of row index.
If the file is irregular, there is the end of line separators, may be set to be the index_col = False pandas NA 1st row as a row index.
 
usecols   : array-like, default None
Returns a subset of data values in the list may correspond to the location of the file must be in the (number may correspond to a specified column) or character column name for the file transfer. For example: usecols parameter may be effective [0,1,2] or [ 'foo', 'bar' , 'baz']. Using this parameter can load faster and reduces memory consumption.
 
as_recarray   : boolean, default False
Deprecated: This parameter will be removed in a future version. Use pd.read_csv (...). To_records () instead.
Numpy a return of recarray instead DataFrame. If this parameter is set to True. Priority will squeeze parameter. And row index is no longer available, the index column will be ignored.
 
squeeze   : boolean, default False
If the value contains a file, a return Series
 
prefix   : str, default None
In the absence of the column headings, add a prefix to the column. For example: Adding 'X' to be X0, X1, ...
 
mangle_dupe_cols   : boolean, default True
Duplicate column, the 'X' ... 'X' is expressed as 'X.0' ... 'X.N'. If set to false will cover all the heavy ranked.
 
dtype   : Type name or dict of column -> type, default None
Each column of data of the data type. E.g. { 'a': np.float64, ' b': np.int32}
 
engine   : {'c', 'python'}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
Analysis engine. C may be selected or python. C engine fast but Python engine function more complete.
 
converters   : dict, default None
Column conversion dictionary function. key can be a column name or column number.
 
true_values   : list, default None
Values to consider as True
 
false_values   : list, default None
Values to consider as False
 
skipinitialspace   : boolean, default False
Ignore blank after delimiter (default is False, that is not ignored).
 
skiprows   : list-like or integer, default None
The number of rows to be ignored (counting from the beginning of the file), or need to skip a list of line numbers (starting from 0).
 
skipfooter   : int, default 0
From the end of the file began to ignore. (c engine does not support)
 
skip_footer   : int, default 0
Not recommended: It is recommended to use skipfooter, function the same.
 
nrows   : int, default None
The number of lines to be read (start counting from the file header).
 
na_values   : scalar, str, list-like, or dict, default None
A set of replacement values NA / NaN's. If you pass parameters, the need for a null value for a particular column. The default is' 1. # IND ',' 1. # QNAN ',' N / A ',' NA ',' NULL ',' NaN ',' nan'`.
 
keep_default_na   : bool, default True
If you specify na_values ​​parameters, and keep_default_na = False, then the default NaN will be overwritten, otherwise add.
 
na_filter   : boolean, default True
Check whether the missing values (empty string or null). For larger files, the data set is not a null value, set na_filter = False can improve reading speed.
 
verbose   : boolean, default False
Whether to print a variety of information output resolver, for example: "the number of non missing values ​​in the Value column" and the like.
 
skip_blank_lines   : boolean, default True
If True, skip blank lines; otherwise referred to as NaN.
 
parse_dates   : boolean or list of ints or names or list of lists or dict, default False
  • boolean True -.> analytical index
  • . List of ints or names eg If [2, 1, 3] - Analytical column values ​​1,2,3> as a separate date column;
  • list of lists eg If [[1, 3]] -.> 3 columns were combined as a date column
  • dict, eg { 'foo': [1, 3]} -> The combined column 1, 3, and combined to the column named "foo"
 
infer_datetime_format   : boolean, default False
If set to True and parse_dates available, the pandas will try to convert the type to date, if you can convert, conversion method and resolution. In some cases 5 to 10 times faster.
 
keep_date_col   : boolean, default False
If you connect multiple columns to parse the date, remains connected to the column involved. The default is False.
 
date_parser   : function, default None
Function is used to parse the date, using the default dateutil.parser.parser do the conversion. Pandas are three different ways to try to resolve any issues under one way is to use.
1. Arrays using one or more (specified by parse_dates) as a parameter;
2. The connection string as a multi-column column as a parameter;
3. Each row date_parser called once or more a function to parse the string (designated by parse_dates) as a parameter.
 
dayfirst   : boolean, default False
Date Type DD / MM format
 
iterator   : boolean, default False
Returns a TextFileReader object to the file block by block.
 
chunksize   : int, default None
The size of the file blocks,   See docs for More Information Tools IO  ON Iterator and chunkSize.
 
compression   : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
Direct use of compressed files on the disk. If you use the argument infer, using gzip, bz2, zip or unzip the file name to '.gz', '.bz2', '.zip', or 'xz' suffix these files, or do not unpack. If you use the zip, then China ZIP package must contain only one file. Decompression is not set to None.
The new version 0.18.1 version supports zip and unzip xz
 
thousands   : str, default None
Thousandth delimiter, such as "" or "."
 
decimal   : str, default '.'
The decimal character (eg: using the European data ',').
 
float_precision   : string, default None
Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.
Designation
 
lineterminator   : str (length 1), default None
Line separator, only the C parser.
 
quotechar   : str (length 1), optional
Quotes, and the start identification character is used as explained, delimiters within quotation marks are ignored.
 
quoting   : int or csv.QUOTE_* instance, default 0
Control quotes constants in the csv. Alternatively QUOTE_MINIMAL (0), QUOTE_ALL (1 ), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3)
 
doublequote   : boolean, default True
Double quotes, when single quotes have been defined, and the parameter is not quoting QUOTE_NONE when a double quote marks represented as elements within an element.
 
escapechar   : str (length 1), default None
When quoting is QUOTE_NONE, specifying a delimiter character is not so limit.
 
comment   : str, default None
Identify the excess lines are not resolved. If the character appears at first, all the rows are ignored. This parameter can only be a character, a blank line (like skip_blank_lines = True) comment lines are ignored as header and skiprows. For example, if the specified comment = '#' parse '#empty \ na, b, c \ n1,2,3' returned to header = 0 then the result will be 'a, b, c' as the header.
 
encoding   : str, default None
Specify the character set type, generally designated 'UTF-. 8'.   List of the Python Standard Encodings
 
dialect   : str or csv.Dialect instance, default None
If you do not specify a particular language, if more than one character is ignored sep. View csv.Dialect specific documents
 
tupleize_cols   : boolean, default False
Leave a list of tuples on columns as is (default is to convert to a Multi Index on the columns)
 
error_bad_lines   : boolean, default True
If a line contains too many columns, the default will not return DataFrame, if set to false, it will be diverted to reject (available only in C parser).
 
warn_bad_lines   : boolean, default True
If error_bad_lines = False, and warn_bad_lines = True then all the "bad lines" will be output (available only in C parser).
 
low_memory   : boolean, default True
Block is loaded into memory, and then parsed low memory consumption. But the type of confusion that may arise. Not to be confused ensure that type needs to be set to False. Dtype argument specifies the type or use. Note the use of the parameter block iterator chunksize or read the entire file is read into the entrance to a Dataframe, ignoring Type (valid only in the C parser)
 
buffer_lines   : int, default None
Not recommended, this parameter will be removed in future versions, because his value is not recommended for use in the parser
 
compact_ints   : boolean, default False
Not recommended, this parameter will be removed in future versions
If compact_ints = True, then there is any integer type column configuration to be stored in accordance with the smallest integer type, will depend on whether a signed parameter use_unsigned
 
use_unsigned   : boolean, default False
Not recommended: this parameter will be removed in future versions
If the integer column is compressed (ie compact_ints = True), the compressed designated column is signed or unsigned.
memory_map   : boolean, default False
If you are using a file in the memory, then directly map files. Use this way to avoid file IO operations again.

Guess you like

Origin www.cnblogs.com/pyleu1028/p/10993170.html