Author: Ai Wen, a master's degree in computer science, an in-house training lecturer and a gold medal interviewer, a senior algorithm expert in the company, is now working in a BAT first-tier factory.
E-mail: [email protected]
Blog: https://wenjie.blog.csdn.net/
Content: Programming with Ai Wenjie "Zero-Basic Beginner Learning Python"
learning target
- Series 和 DataFrame
- index object
- sequentially
- Type property analysis
Introduction to pandas
pandas is a software package of Python language, which is a very common basic programming library when we use Python language for machine learning programming. This article is an introductory tutorial to it.
pandas provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data easy and intuitive. It is intended to be a high-level building block for doing practical data analysis in Python.
The core of pandas is the two data structures of Series and DataFrame.
The comparison of these two types of data structures is as follows:
- Series 1-dimensional labeled array of homogeneous types
- DataFrame 2-dimensional table structure, with labels, variable size, and can contain heterogeneous data columns
DataFrame can be regarded as a container of Series, that is, a DataFrame can contain several Series.
Series
One-dimensional data structure, a combination of arrays and dictionaries, ordered, but can be accessed using non-numeric subscripts
Create Series
- The last line of input: data type, default array int64
- The data is output in the second column
- The first column data index, index in pandas
Create a Series specifying the index column
Create Series using dict type data
DataFrame
DataFrame: a table, and contains columns in order. Can also easily understand the Excel table
Each column has a different numeric type (number, string, boolean).
DataFrame has row index and column index (col index)
Build DataFrame
Create a DataFrame with data dict type
DataFrame automatically sorted by column
DataFrame simple operation
- If a new field is inserted into an existing DataFrame. The field does not exist (there is no corresponding data for this field, and the data is NAN)
- Get the columns of the DataFrame
- Get a column/multiple columns of DataFrame
- Get row data of DataFrame
Index Ojbects (index object)
sequentially
Time series refers to data that can be observed at any time. Many time series have a fixed frequency (fixed frequency), which means that data points will appear regularly according to a certain pattern, such as every 15 seconds, every 5 minutes, or every month. Time series may also be irregular (irregular), without a fixed time law. How to refer to time series data depends on what kind of application we are going to do, we may encounter the following:
Timestamps (time stamp), a specific moment
Fixed periods (fixed period), such as January 2007, or a full year in 2010
Intervals of time, usually with a start and end timestamp. Periods (period) may be seen as a special form of Intervals (interval)
Experiment or elapsed time (experiment or elapsed time); each timestamp is regarded as a specific start time (for example, after being placed in the oven, the diameter of the cookie changes every second) 1.5. 1 Date and time data types
Python standard package for representing time and date data.
- datetime
- time
- calendar
string to time conversion
- Date type formatting
- The to_datetime method in pandas parses many different kinds of date representations
- date_range generates timestamps read according to daily frequency
category data
Basic operations on category row data
There are repetitions. We can unique and value_counts, extract different values from an array, and calculate the frequency
- number of different words
- Occurrences of each category
DataFrame analysis type data
- View each field type
- Convert category string to category object
let's work hard together