Pandas-01 (Installation, Introduction to Series and DataFrame)

Pandas is an extension library for the Python language for data analysis.

Pandas is a powerful tool set for analyzing structured data, based on Numpy (providing high-performance matrix operations).

Pandas can import data from various file formats such as CSV, JSON, SQL, Microsoft Excel.

Pandas can perform operations on various data, such as merging, reshaping, selection, as well as data cleaning and data processing features.

In Pandas, the main data structures are Series (one-dimensional data) and DataFrame (two-dimensional data):

A Series is a one-dimensional array-like object that consists of a set of data (various Numpy data types) and a set of data labels (ie, indices) associated with it.

A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be of a different value type (numeric, string, boolean). DataFrame has both row index and column index, which can be regarded as a dictionary composed of Series (commonly share an index).

Table of contents

1. Pandas installation

1.1 Install Pandas using pip

1.2 Test example

2. Series data structure

2.1 Serise Introduction

2.2 series creation example 

2.3 Series data extraction (index)

3. DataFrame

3.1 Introduction to DataFrame

3.2 DataFrame creation example

3.3 DataFrame Data Extraction (Index)

3.4 DataFrame data operation

3.4.1 Add column data

3.4.2 Column data deletion

3.4.3 Data Append

3.4.4 Index delete data


1. Pandas installation

Like Numpy, Pandas can be installed using pip or conda. If you have already installed the anaconda integrated development environment, which comes with numpy and pandas, you don't need to install it again.

1.1 Install Pandas using pip

pip install pandas

After successful installation, it can be used by importing the pandas package:

import pandas as pd

1.2 Test example

import numpy as np
import pandas as pd

s = pd.Series([1,2,3,4,np.nan,6,8])
s

2. Series data structure

2.1 Serise Introduction

Pandas Series is similar to a column in a table, similar to a one-dimensional array, and can hold any data type.

Series consists of indexes and columns, and the functions are as follows:

pandas.Series( data, index, dtype, name, copy)
  • data : a set of data (ndarray type).

  • index : Data index label, if not specified, starts from 0 by default.

  • dtype : The data type, which will be judged by itself by default.

  • name : Set the name.

  • copy : Copy the data, the default is False.

2.2 series creation example 

1. Create a normal example and set the index value:

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
s

Output result: 

100    a
101    b
102    c
103    d
dtype: object

Extract the elements according to the index:

print(s[102])

 Output result:

 c

2. Use key/value objects, similar to dictionaries to create Series:

data = {
    'user1':100,
    'user2':200,
    'user3':250,
}
s = pd.Series(data)
s

Output result:

 3. Create a Series with a scalar:

s = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
print(s)

Output result: 

 2.3 Series data extraction (index)

In the Series data structure, for data extraction, you can use the array subscript index method, or use the index parameter setting method to extract its elements.

s = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
s

By indexing as follows: 

s[0]

s[0:3]

s[-3:]#取出最后三个

s['a']

s[['a','c','f']]

 The output is

1


a    1
b    2
c    3
dtype: int64


d    4
e    5
f    6
dtype: int64


1


a    1
c    3
f    6
dtype: int64

3. DataFrame

3.1 Introduction to DataFrame

A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be of a different value type (numeric, string, boolean). DataFrame has both row index and column index, which can be regarded as a dictionary composed of Series (commonly share an index).

 The DataFrame construction method is as follows:

pandas.DataFrame( data, index, columns, dtype, copy)
  • data : A set of data (ndarray, series, map, lists, dict, etc.).

  • index : Index value, or can be called row label.

  • columns : column labels, the default is RangeIndex (0, 1, 2, …, n) .

  • dtype : data type.

  • copy : Copy the data, the default is False.

3.2 DataFrame creation example

1. Basic table creation

import pandas as pd
import numpy as np

df = pd.DataFrame()
#2列数据,1列写名字,2列写年龄
data = [['TOM',10],['BOB',12],['AOA',13]]
df = pd.DataFrame(data,columns=['username','age'])

Output result:

 

2. Using dictionary creation 

#字典创建dataframe
data = {
    "username":['小黑','小白','小刘'],
    'income':[1000,2000,3000]
}
df = pd.DataFrame(data,index=[1,2,3])
df

Output result:

 3. Use the Series method to create

d = {
    'one':pd.Series([1,2,3],index=['a','b','c']),
    'two':pd.Series([1,2,3,4],index=['a','b','c','d'])
}
df = pd.DataFrame(d)
df

 Output result:

Empty data will be filled with NaN. 

 

3.3 DataFrame Data Extraction (Index)

In the above case, to get the first column of data, you can use df['one'] to get it:

 In addition, you can also use loc(), iloc() and attributes to index data.

3.4 DataFrame data operation

3.4.1 Add column data

In DataFrame, you can directly use DataFrame[' columns '] = data to add:

df['three'] = pd.Series([4,5,6],index=['a','b','c'])
df['four'] = df['one']+df['three']
print(df)

 3.4.2 Column data deletion

For the entire column of data deletion, you can use del or dataframe.pop method:

del df['four']
df.pop('two')

Output result:

 3.4.3 Data Append

In dataframe, you can use dataframe.append to append data:

d = {
    'one':pd.Series([1,2,3],index=['a','b','c']),
    'two':pd.Series([1,2,3,4],index=['a','b','c','d'])
}
df = pd.DataFrame(d)
df2 = pd.DataFrame([[14,15],[15,16]],columns=['one','two'],index=['e','f'])
df = df.append(df2)
print(df)

Output result:

3.4.4 Index delete data

In dataframe, you can use dataframe.drop(index) to delete data:

For example, delete the row data indexed by d in the above data:

# 删除以d为索引的数据
df.drop("d")

Output result:

 

Guess you like

Origin blog.csdn.net/damadashen/article/details/126892841