Pandas--pd.cut()和pd.qcut()

Pandas–pd.cut()和pd.qcut()

pandas.cut

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates=‘raise’, ordered=True)[source]

Parameters
x:array-like
The input array to be binned. Must be 1-dimensional.
binsint, sequence of scalars, or IntervalIndex
The criteria to bin by.

• int : Defines the number of equal-width bins in the range of x.
The range of x is extended by .1% on each side to include the
minimum and maximum values of x.
• sequence of scalars : Defines the bin edges allowing for
non-uniform width. No extension of the range of x is done.
• IntervalIndex : Defines the exact bins to be used. Note that
IntervalIndex for bins must be non-overlapping.

right:bool, default True
Indicates whether bins includes the rightmost edge or not. If right == True (the default), then the bins [1, 2, 3, 4] indicate (1,2], (2,3], (3,4]. This argument is ignored when bins is an IntervalIndex.
label:sarray or False, default None
Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins. This affects the type of the output container (see below). This argument is ignored when bins is an IntervalIndex. If True, raises an error. When ordered=False, labels must be provided.
retbin:sbool, default False
Whether to return the bins or not. Useful when bins is provided as a scalar.
precision:int, default 3
The precision at which to store and display the bins labels.
include_lowest:bool, default False
Whether the first interval should be left-inclusive or not.
duplicates:{default ‘raise’, ‘drop’}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.
ordered:bool, default True
Whether the labels are ordered or not. Applies to returned types Categorical and Series (with Categorical dtype). If True, the resulting categorical will be ordered. If False, the resulting categorical will be unordered (labels must be provided).

Returns
out:Categorical, Series, or ndarray
An array-like object representing the respective bin for each value of x. The type depends on the value of labels.

• None (default) : returns a Series for Series x or a Categorical
for all other inputs. The values stored within are Interval dtype.
• sequence of scalars : returns a Series for Series x or a
Categorical for all other inputs. The values stored within are
whatever the type in the sequence is.
• False : returns an ndarray of integers.

bin:snumpy.ndarray or IntervalIndex.

The computed or specified bins. Only returned when retbins=True. For scalar or sequence bins, this is an ndarray with the computed bins. If set duplicates=drop, bins will drop non-unique bin. For an IntervalIndex bins, this is equal to bins.

pandas.cut(x,bin,right=True,labels=None,retbins=False,precision=3,include_lowest=False, duplicates=’raise’, ordered=True)
将值分类为离散间隔。
当您需要将数据值分段和分类到 bin 中时，请使用cut 。此函数对于从连续变量到分类变量也很有用。例如，cut可以将年龄转换为年龄范围组。支持分箱成相等数量的箱，或预先指定的箱阵列。

参数：

x: array数组要分箱的输入数组，必须是一维的.
bins: 分箱的段数,一般为整型，也可以是序列.
retbins: 布尔值;是否返回数值所在分组，True返回.
prscision: int，默认3 存储和显示bin标签的精度.

Return:

out:
一个类似数组的对象，表示x的每个值的相应 bin 。类型取决于标签的值。
无（默认）：为系列x返回一个系列或为所有其他输入返回一个分类。存储在其中的值是 Interval dtype。
标量序列：为系列x返回一个系列或为所有其他输入返回一个分类。存储在其中的值是序列中的任何类型。
False ：返回整数的 ndarray。

bins ：
numpy.ndarray 或 IntervalIndex。
计算或指定的 bin。仅在retbins=True时返回。对于标量或序列bins，这是一个带有计算 bins 的 ndarray。

例：

import numpy as np
import pandas as pd
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)

#[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
#Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]]

pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)
#([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
 #Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]],
# array([0.994, 3.   , 5.   , 7.   ]))

pandas.qcut

pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates=‘raise’)[source]
Quantile-based discretization function.

Discretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.

Parameters

x：1d ndarray or Series
q：int or list-like of float
Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles.
labels：array or False, default None
Used as labels for the resulting bins. Must be of the same length as the resulting bins. If False, return only integer indicators of the bins. If True, raises an error.
retbins：bool, optional
Whether to return the (bins, labels) or not. Can be useful if bins is given as a scalar.
precision：int, optional
The precision at which to store and display the bins labels.
duplicates：{default ‘raise’, ‘drop’}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.

Returns

out：Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series of type category if input is a Series else Categorical. Bins are represented as categories when categorical data is returned.
bins：ndarray of floats
Returned only if retbins is True.

例：

import pandas as pd
pd.qcut(range(5), 4)

#[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
#Categories (4, interval[float64]): [(-0.001, 1.0] < (1.0, 2.0] < (2.0, 3.0] < (3.0, 4.0]]