pandas Learning 4: (average moving applications, polymerization conversion, mapping, packet, rolling, spreading, weighting index) of the process sequence

Built sequence number of functions for the cyclic execution of the operations of the elements of a sequence.

First, the application function

Applying a function to each original sequence:

Series.apply(self, func, convert_dtype=True, args=(), **kwds)

Parameter Notes:

  • func: a function of application, may be self-defined function, or a function NumPy
  • convert_dtype: The default value is True, attempts to apply the results into a better func data type, if set to False, the result is converted to dtype = object.
  • args: tuples, after the sequence values, the parameters passed to the location of func (positional arguments)
  • ** kwds: func passed to keyword (keyword) parameters, there may be 0, 1, a plurality of

The difference between positional parameters and keyword parameters are:

  • Location parameters by matching position to pass parameters, keyword parameters by matching parameter name mass participation.
  • Categories can have multiple parameters, the parameter name is not fixed, only the rearmost apply function () is, e.g., a keyword parameters k1, k2, k3, then kwargs = [k1, k2, k3]
  • Args parameter can have only one position

1, the transfer function (use position parameter) is customize

Create a custom function, the function is applied on the sequence

>>> s = pd.Series([20, 21, 12], index=['London', 'New York', 'Helsinki'])
>>> def subtract_custom_value(x, custom_value):
...     return x - custom_value
>>> s.apply(subtract_custom_value, args=(5,))
London      15
New York    16
Helsinki     7
dtype: int64

2, the transfer function (using the keyword parameters) customizable

You can see, the keyword parameters can only function in later apply,

>>> def add_custom_values(x, **kwargs):
...     for month in kwargs:
...         x += kwargs[month]
...     return x
>>> s.apply(add_custom_values, june=30, july=20, august=25)
London      95
New York    96
Helsinki    87
dtype: int64

3, the transfer function defined NumPy

>>> s.apply(np.log)
London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64

Second, the polymerization

is an abbreviation agg aggregate, two functions are equivalent, the polymerization operation sequence performed, the function call can only return a single scalar value. Polymerization for performing operations on all elements of the polymerization sequence, the specific polymerization operation is determined by the parameter to func:

Series.agg(self, func, axis=0, *args, **kwargs)
Series.aggregate(self, func, axis=0, *args, **kwargs)

Parameter Notes:

  • func: function variables, function names (strings), list (function variables, a list of function names)
  • axis: For the sequence is, axis can only be 0

For example, for the minimum and maximum values ​​of the sequence:

>>> s = pd.Series([1, 2, 3, 4])
>>> s.agg(['min', 'max'])
min   1
max   4
dtype: int64

Third, the conversion

Conversion is a function call, converts the sequence of values, and apply transform function is very similar, except that the transform can call a plurality of functions, and can apply a function call:

Series.transform(self, func, axis=0, *args, **kwargs)

Parameter Notes:

func: function variables, function names, function list

>>> s = pd.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056

Fourth, the mapping

The value of the mapping sequence for other values

Series.map(self, arg, na_action=None)

Parameter Notes:

  • arg: mapping may be a function, a dictionary or sequence
  • na_action: Default is None, the default process; if it is ignore, then the display is NaN.

Arg using the general dictionary, matching the value of the sequence with the dictionary key, the value of the original sequence is replaced value dictionary.

When the sequence arg is, the alignment using the index, the value of the original sequence is mapped to a value of arg sequence.

Five groups

Packet sequences, then the returned object grouping, and aggregation function can be called to get aggregate values ​​for each packet:

Series.groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)

Parameter Notes:

by: a grouping sequences, the parameter value may be a function by column name or column name list, the mapping

1, by a function

If by value is a function, then the call is a sequence index

>>> s=pd.Series([1,2,3,4])
>>> s.groupby(by=lambda x: x<3).count()
False    1
True     3
dtype: int64

The sequence can be accessed by index value element values:

>>> s.groupby(by=lambda x: s.iat[x]<3).count()
False    2
True     2
dtype: int64

2, by a label list

If by a list of tags typically be grouped data values ​​in columns, normally used for data block (DataFrame) in

3, mapping (dictionary)

When using the dictionary as a map, corresponding to dictionary key value of the sequence, the sequence group according to the original value dictionary

>>> s.groupby(by={1:'a',2:'a',3:'b',4:'b'}).count()
a    2
b    1
dtype: int64

4, the mapping (sequence)

When using a sequence as map values ​​by the sequence for the original packet sequence, sequence by a value corresponding to the same value belong to the same original sequence of packets; original sequence and the method by matching sequence is aligned with the index.

>>> s.groupby(by=pd.Series(data=[1,2,1,1],index=[0,2,3,1])).mean()
1    2.333333
2    3.000000
dtype: float64

Index alignment is how is it?

For sequence by parameter, data is 1, 2, 1, 1, which means that the original sequence is divided into two groups, the key packet 1 and 2, respectively.

Is indexed by the sequence 0, 2, 3, 1, that is, when the index of the original sequence of 0, 3, 1, 1 is the corresponding packet key, when the index of the original sequence is 2, the corresponding packet key 2.

After alignment index, the value of the original sequence of packets belonging to a 2,4; 3 values ​​in the original sequence packet 2 belongs, then calculate the mean of each packet.

Sixth, scroll

Calculating rolling window, each window calculating an aggregate value, each rolling step forward (one step is one element):

Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)

Parameter Notes:

  • window: scrolling window value, or offset, each window is a fixed value.
  • min_periods: minimum value of each window, if the number of elements in the window is less than min_periods, return NaN3; default, min_periods equal to the value of the window parameter.

For example, for a sequence, when the window is set to 2, if not set min_periods, then the window in order to have values, the window size must be 2, the first element in the sequence of only one value in the window, the flow returns NaN .

>>> s=pd.Series([1,2,3,4])
>>> s.rolling(2).sum()
0    NaN
1    3.0
2    5.0
3    7.0
dtype: float64
>>> s.rolling(window=2,min_periods =1).sum()
0    1.0
1    3.0
2    5.0
3    7.0
dtype: float64

Seven expansion

Extension means starting from the first sequence of elements, element by element values ​​rearwardly in the polymerization, when the polymerization is a sum function, it represents the first element begins to calculate the cumulated:

Series.expanding(self, min_periods=1, center=False, axis=0)

For example, calculated from the first element of a sequence 1,2,3,4 accumulated:

>>> s=pd.Series([1,2,3,4])
>>> s.expanding().sum()
0     1.0
1     3.0
2     6.0
3    10.0
dtype: float64

Eight, exponentially weighted moving average

ewm (Exponentially Weighted Moving) is referred to as an index weighted moving, usually, are elements of a sequence of exponential weighting, the weighted average is calculated:

Series.ewm(self, com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)

1, Parameter Comment

During exponential weighting factor smoothing targeting four stars:

adjust: in the early stages of attenuation adjustment factor to address the relative weight imbalance.

  • When set to True adjust, the weighted mean is calculated: (1-alpha) ** (n-1), (1-alpha) ** (n-2), ..., 1-alpha, 1
  • When set to adjust to False, the weighted mean is calculated: weighted_average [0] = arg [0]; weighted_average [i] = (1-alpha) * weighted_average [i-1] + alpha * arg [i].

2, exponentially weighted moving average of significance

Exponentially weighted moving average (EWMA, Exponentially Weighted Moving Average) formula is: EWMA (t) = aY (t) + (1-a) EWMA (t-1), t = 1,2, ....., n;

It indicates the meaning of: at time t, the actual observed value may be ascertained the EWMA (t), wherein, the EWMA (t) represents the estimated value of the time t; measured value of the Y (t) at time t; n total observed time; a (0 <a <1) represents a historical measurements for the weighting factor.

It is called exponential weighting, since the weighting coefficient is a decreasing exponential, i.e., the respective index over time exhibit decreasing exponential. It represents a factor closer to 1 the higher the weight of the current sample values, past measured worth lower the weight, the stronger timeliness estimated value (unit), whereas weaker.

This phenomenon may be described as stationary to cope with mutation, stability decreases with increasing a. When setting a small coefficient a, the mean value is greater drawn past the reference measurement, the reference current value to a lesser extent, show a strong stability; set larger when the coefficients a, to give mean a greater degree out of the reference to the current measured value, showed strong volatility. For example, for the sequence, set larger index a = 0.8 and the smaller exponent a = 0.2, the more the position on the mean derived closer or farther away from the current value:

>>> s=pd.Series([1,2,3,4])
>>> s.ewm(alpha=0.8).mean()
0    1.000000
1    1.833333
2    2.774194
3    3.756410
dtype: float64
>>> s.ewm(alpha=0.2).mean()
0    1.000000
1    1.555556
2    2.147541
3    2.775068
dtype: float64

 

 

 

Reference documents:

pandas.Series.apply

Guess you like

Origin www.cnblogs.com/ljhdo/p/10424224.html