深入学习pandas(1) : 10 minutes to pandas

pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
10 minutes to pandas

df2 = pd.DataFrame({
    
    'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})

# filter
df[df['E'].isin(['test'])]


df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                   columns=['A', 'B', 'C', 'D'])
# plot

Intro to data structures

Here is a basic tenet to keep in mind : data alignment is intrinsic.

The link between labels and data will not be broken unless done so explicitly by you.

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

The axis labels are collectively referred to as the index.

The basic method to create a Series is to call :

>>> s = pd.Series(data, index=index)

data can be many different things:

a Python dict
an ndarray
a scalar value

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is :

# 1. From ndarray
# If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0,...,len(data)-1]
s = pd.Series(np.random.randn(5), index=['a','b', 'c', 'd', 'e'])

# 2. From dict
# The Series index will be ordered by the dict's insertion order(Python>=3.6, Pandas>=0.23), otherwise, the Series index will be the lexically ordered list of dict keys
d = {
      
      'b':1, 'a':0, 'c':2}
s = pd.Series(d)
# If an index is passed, the values in data corresponding to the labels in the index will be pulled out
s = pd.Series(d, index=['b', 'c', 'd', 'a'])

# 3. From scalar value
# If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
s = pd.Series(5.0, index=['b', 'c', 'd', 'a'])

Pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.

深入学习pandas(1) : 10 minutes to pandas

pandas

10 minutes to pandas

Intro to data structures

Series

猜你喜欢