know
pandas is a python third-party library that provides high-performance, easy-to-use data types and analysis tools.
It introduces two data types, series and DataFrame.
Series index + one-dimensional data
DataFrmae index + two-dimensional data
pandas The original design of pandas is to establish a corresponding relationship between data and index. By manipulating the index, you can manipulate the data in disguise without caring about the latitude of the data and reducing the burden of thinking.
pandas expects users to treat series and DataFrame objects as if they were a single piece of data.
The data has two columns, the index on the left and the data on the right.
Comparing Numpy and Pandas
, the former pays more attention to the structural expression of a set of data, that is, the latitude of the data. The latter is more concerned with the application representation of the data.
create
The index can also be specified using the form index=[].
Series can be created from types like
python lists, python dictionaries, scalar values (one value), ndarrays, other functions.
Scalar value creation:
very similar to the dictionary type, creation, the key is the index:
when the specified index is different from the index created by the dictionary, the two are automatically merged, the 'd' index has no corresponding value, and is marked with NAN, other The values are changed from int64 to float64, because pandas is based on Numpy, and Numpy defaults to floating-point numbers.
Creating from ndarray types
Not only values can be created from ndarray types, but indexes can also be created from ndarray types:
Basic operation
Because the series type includes two parts, index and values, its operation can also be summarized into these two parts, which is similar to the ndarray type and the dictionary type.
The index keyword can be omitted.
For a series type, the index is obtained through .index, the type name is 'index'.values to obtain data, and the array indicates that it is a numpy type:
this test reveals the nature of pandas, the value part is a numpy type, and a new index and value are created separately. Association, the combination of the two, is the series type.
Or you can think of pandas as a kind of "new dictionary", the key is the automatically created index, and the value is numpy.
It is logical to obtain the values directly through the key:
Note that even if the user defines the index of the series type, the default index is automatically generated, so the value can be obtained through b[1], but the two indexes cannot be mixed; When there are values, the key should be marked again with , and [].
Slicing operation:
It can be sliced by automatic index. If there is a custom index, it will be sliced together. The
difference is that if a custom index is used to slice, including the rightmost, the 'c' index is still sliced.
To judge whether a custom index is in the series type, use The keywords in, in will not judge the automatic index:
you can also use the get() method to get the values:
The operations and operations in Numpy can be used for the series type.
Two series types are merged, that is, series+series, automatic alignment operation:
the value with the same index value is operated, and the value with different index is set to NaN.
Both the Series object and the index can have a name, which is stored in the attribute .name by
default . , b is no additional name, b.name is not displayed. You can specify the name of the series object, and the name of the index.
Note that only one [] is required for modifying the value, which is different from the return value.
How to modify the index of the series?
For the time being, I only know that I can use .index to reassign:
it still feels inconvenient. If I only want to modify one of the indexes, do I need to reassign all of them? ? For example, I just want to change 'h' to 't', and the others remain unchanged. I wonder if I can specify the modification?