Built sequence number of functions for the cyclic execution of the operations of the elements of a sequence.
First, the application function
Applying a function to each original sequence:
Series.apply(self, func, convert_dtype=True, args=(), **kwds)
Parameter Notes:
- func: a function of application, may be self-defined function, or a function NumPy
- convert_dtype: The default value is True, attempts to apply the results into a better func data type, if set to False, the result is converted to dtype = object.
- args: tuples, after the sequence values, the parameters passed to the location of func (positional arguments)
- ** kwds: func passed to keyword (keyword) parameters, there may be 0, 1, a plurality of
The difference between positional parameters and keyword parameters are:
- Location parameters by matching position to pass parameters, keyword parameters by matching parameter name mass participation.
- Categories can have multiple parameters, the parameter name is not fixed, only the rearmost apply function () is, e.g., a keyword parameters k1, k2, k3, then kwargs = [k1, k2, k3]
- Args parameter can have only one position
1, the transfer function (use position parameter) is customize
Create a custom function, the function is applied on the sequence
>>> s = pd.Series([20, 21, 12], index=['London', 'New York', 'Helsinki']) >>> def subtract_custom_value(x, custom_value): ... return x - custom_value >>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64
2, the transfer function (using the keyword parameters) customizable
You can see, the keyword parameters can only function in later apply,
>>> def add_custom_values(x, **kwargs): ... for month in kwargs: ... x += kwargs[month] ... return x >>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64
3, the transfer function defined NumPy
>>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64
Second, the polymerization
is an abbreviation agg aggregate, two functions are equivalent, the polymerization operation sequence performed, the function call can only return a single scalar value. Polymerization for performing operations on all elements of the polymerization sequence, the specific polymerization operation is determined by the parameter to func:
Series.agg(self, func, axis=0, *args, **kwargs)
Series.aggregate(self, func, axis=0, *args, **kwargs)
Parameter Notes:
- func: function variables, function names (strings), list (function variables, a list of function names)
- axis: For the sequence is, axis can only be 0
For example, for the minimum and maximum values of the sequence:
>>> s = pd.Series([1, 2, 3, 4]) >>> s.agg(['min', 'max']) min 1 max 4 dtype: int64
Third, the conversion
Conversion is a function call, converts the sequence of values, and apply transform function is very similar, except that the transform can call a plurality of functions, and can apply a function call:
Series.transform(self, func, axis=0, *args, **kwargs)
Parameter Notes:
func: function variables, function names, function list
>>> s = pd.Series(range(3)) >>> s 0 0 1 1 2 2 dtype: int64 >>> s.transform([np.sqrt, np.exp]) sqrt exp 0 0.000000 1.000000 1 1.000000 2.718282 2 1.414214 7.389056
Fourth, the mapping
The value of the mapping sequence for other values
Series.map(self, arg, na_action=None)
Parameter Notes:
- arg: mapping may be a function, a dictionary or sequence
- na_action: Default is None, the default process; if it is ignore, then the display is NaN.
Arg using the general dictionary, matching the value of the sequence with the dictionary key, the value of the original sequence is replaced value dictionary.
When the sequence arg is, the alignment using the index, the value of the original sequence is mapped to a value of arg sequence.
Five groups
Packet sequences, then the returned object grouping, and aggregation function can be called to get aggregate values for each packet:
Series.groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)
Parameter Notes:
by: a grouping sequences, the parameter value may be a function by column name or column name list, the mapping
1, by a function
If by value is a function, then the call is a sequence index
>>> s=pd.Series([1,2,3,4]) >>> s.groupby(by=lambda x: x<3).count() False 1 True 3 dtype: int64
The sequence can be accessed by index value element values:
>>> s.groupby(by=lambda x: s.iat[x]<3).count() False 2 True 2 dtype: int64
2, by a label list
If by a list of tags typically be grouped data values in columns, normally used for data block (DataFrame) in
3, mapping (dictionary)
When using the dictionary as a map, corresponding to dictionary key value of the sequence, the sequence group according to the original value dictionary
>>> s.groupby(by={1:'a',2:'a',3:'b',4:'b'}).count() a 2 b 1 dtype: int64
4, the mapping (sequence)
When using a sequence as map values by the sequence for the original packet sequence, sequence by a value corresponding to the same value belong to the same original sequence of packets; original sequence and the method by matching sequence is aligned with the index.
>>> s.groupby(by=pd.Series(data=[1,2,1,1],index=[0,2,3,1])).mean() 1 2.333333 2 3.000000 dtype: float64
Index alignment is how is it?
For sequence by parameter, data is 1, 2, 1, 1, which means that the original sequence is divided into two groups, the key packet 1 and 2, respectively.
Is indexed by the sequence 0, 2, 3, 1, that is, when the index of the original sequence of 0, 3, 1, 1 is the corresponding packet key, when the index of the original sequence is 2, the corresponding packet key 2.
After alignment index, the value of the original sequence of packets belonging to a 2,4; 3 values in the original sequence packet 2 belongs, then calculate the mean of each packet.
Sixth, scroll
Calculating rolling window, each window calculating an aggregate value, each rolling step forward (one step is one element):
Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
Parameter Notes:
- window: scrolling window value, or offset, each window is a fixed value.
- min_periods: minimum value of each window, if the number of elements in the window is less than min_periods, return NaN3; default, min_periods equal to the value of the window parameter.
For example, for a sequence, when the window is set to 2, if not set min_periods, then the window in order to have values, the window size must be 2, the first element in the sequence of only one value in the window, the flow returns NaN .
>>> s=pd.Series([1,2,3,4]) >>> s.rolling(2).sum() 0 NaN 1 3.0 2 5.0 3 7.0 dtype: float64 >>> s.rolling(window=2,min_periods =1).sum() 0 1.0 1 3.0 2 5.0 3 7.0 dtype: float64
Seven expansion
Extension means starting from the first sequence of elements, element by element values rearwardly in the polymerization, when the polymerization is a sum function, it represents the first element begins to calculate the cumulated:
Series.expanding(self, min_periods=1, center=False, axis=0)
For example, calculated from the first element of a sequence 1,2,3,4 accumulated:
>>> s=pd.Series([1,2,3,4]) >>> s.expanding().sum() 0 1.0 1 3.0 2 6.0 3 10.0 dtype: float64
Eight, exponentially weighted moving average
ewm (Exponentially Weighted Moving) is referred to as an index weighted moving, usually, are elements of a sequence of exponential weighting, the weighted average is calculated:
Series.ewm(self, com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)
1, Parameter Comment
During exponential weighting factor smoothing targeting four stars:
adjust: in the early stages of attenuation adjustment factor to address the relative weight imbalance.
- When set to True adjust, the weighted mean is calculated: (1-alpha) ** (n-1), (1-alpha) ** (n-2), ..., 1-alpha, 1
- When set to adjust to False, the weighted mean is calculated: weighted_average [0] = arg [0]; weighted_average [i] = (1-alpha) * weighted_average [i-1] + alpha * arg [i].
2, exponentially weighted moving average of significance
Exponentially weighted moving average (EWMA, Exponentially Weighted Moving Average) formula is: EWMA (t) = aY (t) + (1-a) EWMA (t-1), t = 1,2, ....., n;
It indicates the meaning of: at time t, the actual observed value may be ascertained the EWMA (t), wherein, the EWMA (t) represents the estimated value of the time t; measured value of the Y (t) at time t; n total observed time; a (0 <a <1) represents a historical measurements for the weighting factor.
It is called exponential weighting, since the weighting coefficient is a decreasing exponential, i.e., the respective index over time exhibit decreasing exponential. It represents a factor closer to 1 the higher the weight of the current sample values, past measured worth lower the weight, the stronger timeliness estimated value (unit), whereas weaker.
This phenomenon may be described as stationary to cope with mutation, stability decreases with increasing a. When setting a small coefficient a, the mean value is greater drawn past the reference measurement, the reference current value to a lesser extent, show a strong stability; set larger when the coefficients a, to give mean a greater degree out of the reference to the current measured value, showed strong volatility. For example, for the sequence, set larger index a = 0.8 and the smaller exponent a = 0.2, the more the position on the mean derived closer or farther away from the current value:
>>> s=pd.Series([1,2,3,4]) >>> s.ewm(alpha=0.8).mean() 0 1.000000 1 1.833333 2 2.774194 3 3.756410 dtype: float64 >>> s.ewm(alpha=0.2).mean() 0 1.000000 1 1.555556 2 2.147541 3 2.775068 dtype: float64
Reference documents: