pandas练习

将dataframe中乘客按照年龄做一个分类

年龄在0-11岁的为0类,12-22为1类，23-33为2类，34-44为3类，45-55为4类，56-66为5类，其余为6类
请做把Age改成乘客的年龄类别，如果年龄有缺失，就放入乘客的平均年龄

代码前：

代码：

def Age_select(a):
    if a>0 and a<=11:
        return '0类'
    if a>=12 and a<=22:
        return '1类'
    if a>=23 and a<=33:
        return '2类'
    if a>=34 and a<=44:
        return '3类'
    if a>=45 and a<=55:
        return '4类'
    if a>=56 and a<=66:
        return '5类'
    else:
        return '6类'
    
titani_c['Age'].fillna(titani_c['Age'].mean(),inplace=True)
titani_c['Age']=titani_c.apply(lambda x:Age_select(x.Age),axis=1)
titani_c.head(10)

输出结果：

这道题主要运用了dataframe的apply（）的函数;以下是api

 titanic_df.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
Docstring:
Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is
either the DataFrame's index (``axis=0``) or the DataFrame's columns
(``axis=1``). By default (``result_type=None``), the final return type
is inferred from the return type of the applied function. Otherwise,
it depends on the `result_type` argument.

Parameters
----------
func : function
    Function to apply to each column or row.
axis : {0 or 'index', 1 or 'columns'}, default 0
    Axis along which the function is applied:

    * 0 or 'index': apply function to each column.
    * 1 or 'columns': apply function to each row.
broadcast : bool, optional
    Only relevant for aggregation functions:

    * ``False`` or ``None`` : returns a Series whose length is the
      length of the index or the number of columns (based on the
      `axis` parameter)
    * ``True`` : results will be broadcast to the original shape
      of the frame, the original index and columns will be retained.

    .. deprecated:: 0.23.0
       This argument will be removed in a future version, replaced
       by result_type='broadcast'.

raw : bool, default False
    * ``False`` : passes each row or column as a Series to the
      function.
    * ``True`` : the passed function will receive ndarray objects
      instead.
      If you are just applying a NumPy reduction function this will
      achieve much better performance.
reduce : bool or None, default None
    Try to apply reduction procedures. If the DataFrame is empty,
    `apply` will use `reduce` to determine whether the result
    should be a Series or a DataFrame. If ``reduce=None`` (the
    default), `apply`'s return value will be guessed by calling
    `func` on an empty Series
    (note: while guessing, exceptions raised by `func` will be
    ignored).
    If ``reduce=True`` a Series will always be returned, and if
    ``reduce=False`` a DataFrame will always be returned.

    .. deprecated:: 0.23.0
       This argument will be removed in a future version, replaced
       by ``result_type='reduce'``.

result_type : {'expand', 'reduce', 'broadcast', None}, default None
    These only act when ``axis=1`` (columns):

    * 'expand' : list-like results will be turned into columns.
    * 'reduce' : returns a Series if possible rather than expanding
      list-like results. This is the opposite of 'expand'.
    * 'broadcast' : results will be broadcast to the original shape
      of the DataFrame, the original index and columns will be
      retained.

    The default behaviour (None) depends on the return value of the
    applied function: list-like results will be returned as a Series
    of those. However if the apply function returns a Series these
    are expanded to columns.

    .. versionadded:: 0.23.0

args : tuple
    Positional arguments to pass to `func` in addition to the
    array/series.
**kwds
    Additional keyword arguments to pass as keywords arguments to
    `func`.

Notes
-----
In the current implementation apply calls `func` twice on the
first column/row to decide whether it can take a fast or slow
code path. This can lead to unexpected behavior if `func` has
side-effects, as they will take effect twice for the first
column/row.

See also
--------
DataFrame.applymap: For elementwise operations
DataFrame.aggregate: only perform aggregating type operations
DataFrame.transform: only perform transformating type operations

Examples
--------

>>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as
``np.sqrt(df)``):

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Retuning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results
to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing
``result_type='expand'``. The resulting column names
will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Passing ``result_type='broadcast'`` will ensure the same shape
result, whether list-like or scalar is returned by the function,
and broadcast it along the axis. The resulting column names will
be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2

Returns
-------
applied : Series or DataFrame
File:      /usr/local/lib/python3.5/dist-packages/pandas/core/frame.py
Type:      method

猜你喜欢