pandas Learning 6: Index

Series and the index is DataFrame line label, and there may be one or more indexes. If DataFrame Series and an index, called single-level index; if there are multiple indexes, called a multi-level index. And similar DataFrame Sereis index of a column of data, can have a variety of data types. Type Index are: integer index (Numeric Index), classification index (Category Index), date and time index (DateTime Index, Timedelta Index), the period index (Period Index), the range of the index (Range Index), the interval index (Interval Index) , multi-level index (multi-level index).

Multi-level index (Multi-Level Index) refers to the sequence (Series), or data block (DataFrame) a plurality of indexes, similar to the two-dimensional multilevel indexing relation table, i.e., Series or DataFrame has a structure similar to DataFrame index.

The most commonly used index is an integer index, classification index and date index.

First, the basic function

For the most basic constructor creates indexes:

pandas.Index(data,dtype=object,copy,name,tupleize_cols=True)

Parameter Notes:

  • data: an object similar to a one-dimensional array, used to create the index, the index is ordered.
  • dtype: Default is object, for indicating the type of index element
  • copy: copy of the input data
  • name: name of the index, the default value is Index
  • tupleize_cols: If set to True, the attempt to create a multi-level index (MultiIndex).

For example, to create an integer index:

>>> pd.Index([1, 2, 3])
Int64Index([1, 2, 3], dtype='int64')

Second, the index of property

Index is similar to a two-dimensional relational tables, with specific properties:

  • values: the value of the index
  • is_monotonic、is_monotonic_increasing、is_monotonic_decreasing:单调
  • is_nuique, has_duplicates: unique values, duplicate values,
  • hasnans: Is there value NA
  • Data type index element: dtype
  • name: The name of the property index,
  • names: If the index is a multi-stage (MultiLevel), then each one has a name
  • The number of index elements: size
  • T: index transpose

Third, the lack of value of the index

Checks for missing values, ISNA () for each value of the index is checked, when the value of NA, the return True; NA when the value is not, returns False. notna () for each value of the index is checked, when the value is not NA, returns True; when the value of NA, the return False.

Index.isna (self) 
Index.notna (self)

Fill in missing values, filled with a scalar value NA, the downcast type indicates downward compatibility:

Index.fillna(self, value=None, downcast=None)

Delete missing values, parameters indicate how how to remove missing values, valid values ​​are any and all:

Index.dropna(self, how='any')

Fourth, the index ranking

Sorted by the value of the index, but the index returns the index, * args and ** kwargs parameters are passed to the function parameters numpy.ndarray.argsort.

Index.argsort(self, *args, **kwargs)

Sorted by the value of the index, return a copy of the sort, the parameter indicates whether return_indexer returns the index subscript:

Index.sort_values(self, return_indexer=False, ascending=True)

For example, the following indexes:

>>> idx = pd.Index(['b', 'a', 'd', 'c'])
Index(['b', 'a', 'd', 'c'], dtype='object')

Sorted by the index value, the index returns sorted index:

>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])

To see the value of the index sorted by the following standard:

>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')

Of course, you can also return the sorted index directly:

>>> idx.sort_values()
Index(['a', 'b', 'c', 'd'], dtype='object')

To return the sorted index and the corresponding subscript, set the parameter return_indexer = True:

>>> idx.sort_values(return_indexer=True)
(Index(['a', 'b', 'c', 'd'], dtype='object'), array([1, 0, 3, 2], dtype=int64))

Fifth, the index of conversion

 The index can be converted to List, DataFrame, sequence, array (ndarray) and the like, Ravel () function is used to index into an array expanded form.

Index.to_list(self)
Index.to_frame(self, index=True, name=None)
Index.to_series(self, index=None, name=None)
Index.ravel(self, order='C')

The conversion type index value specified type:

Index.astype(self, dtype, copy=True)

Operating six index values

The index value can be a series of operations, the most commonly used functions are listed below index operations:

1, the index returns the index where the maximum or minimum

Index.argmin(self, axis=None, skipna=True, *args, **kwargs)
Index.argmax(self, axis=None, skipna=True, *args, **kwargs)

2, delete the index value

Deletes the specified index

Index.delete(self, loc)
Index.drop(self, labels, errors='raise')

3, duplicate values

drop_duplicates () function is used to delete duplicate values, the effective value of the parameter is keep first, false and False, frist reservations first, last last reservations, False indicates to delete duplicate values.

Index.drop_duplicates(self, keep='first')

Check whether the index value is repeated when repeated values, a position corresponding to the value of the index value to True.

Index.duplicated(self, keep='first')

4, insert the new value

Index.insert(self, loc, item)

5, rename the index name attribute

Index.rename(self, name, inplace=False)

6, the only value of the index

Index.unique(self, level=None)

7, Gets the index subscript

The first way is to pass the index value list:

Index.get_indexer(self, target, method=None, limit=None, tolerance=None)

Parameter Notes:

target: index list

method:None, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’

  • None expressed full match:
  • pad / ffill: If there is no match, the former value to find a non-NA
  • backfill / bfill: If there is no match to, after finding a non-value NA
  • nearest: If there is no match, find the nearest non-NA value

It does not completely match the maximum number of consecutive tag in the target: limit

tolerance: maximum distance index value of the matching position between a not perfectly match the original index and a new index that best satisfies the equation abs (index [indexer] -target) <= tolerance.

 The second way is to pass the index of a scalar value, the scalar value in the return position index:

Index.get_loc(self, key, method=None, tolerance=None)

Seven other types of indexes

  • 1, an integer index
  • 2. Classification Index
  • 3, the index date

 

 

Reference documents:

pandas index

 

Guess you like

Origin www.cnblogs.com/ljhdo/p/11556410.html