Article directory
Quick start with pandas
learning target
- Able to know DataFrame and Series data structures
- Ability to load csv and tsv data sets
- Ability to distinguish row and column labels and row and column position numbers of DataFrame
- Ability to obtain data in specified rows and columns of DataFrame
1. Introduction to DataFrame and Series
Pandas is an open source Python library for data analysis, which can implement data loading, cleaning, conversion, statistical processing, visualization and other functions.
The two most basic data structures of pandas:
1)DataFrame
- Used to process structured data (SQL data tables, Excel tables)
- It can be simply understood as a data table (with row labels and column labels)
2)Series
- Used to process single column data, you can also think of DataFrame as a dictionary or collection composed of Series objects.
- It can be simply understood as a row or column of the data table
2. Load data sets (csv and tsv)
2.1 Introduction to csv and tsv file formats
Both csv and tsv files are file types that store a two-dimensional table of data.
Note: The column elements in each column of the csv file are separated by commas, and the column elements of each row of the tsv file are separated by \t.
2.2 Load data sets (tsv and csv)
1) First open jupyter notebook, enter the directory where you plan to write the code, and create 01-pandas快速入门.ipynb
the file:
Note: Place the provided data data set directory in advance in the same directory as 01-pandas quick start.ipynb. Subsequent courses will load the data set in the data directory.
2) Import the pandas package
Note: pandas is not a Python standard library, so import pandas first
# 在 ipynb 文件中导入 pandas
import pandas as pd
3) Load the csv file data set
tips = pd.read_csv('./data/tips.csv')
tips
4) Load tsv file data set
# sep参数指定tsv文件的列元素分隔符为\t,默认sep参数是,
china = pd.read_csv('./data/china.tsv', sep='\t')
china