Programming with Ai Wenwen "Zero-Basic Introduction to Learning Python" (7) pandas data analysis

Author: Ai Wen, a master's degree in computer science, an in-house training lecturer and a gold medal interviewer, a senior algorithm expert in the company, is now working in a BAT first-tier factory.
E-mail: [email protected]
Blog: https://wenjie.blog.csdn.net/
Content: Programming with Ai Wenjie "Zero-Basic Beginner Learning Python"

learning target

  • Series 和 DataFrame
  • index object
  • sequentially
  • Type property analysis

Introduction to pandas

pandas is a software package of Python language, which is a very common basic programming library when we use Python language for machine learning programming. This article is an introductory tutorial to it.

pandas provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data easy and intuitive. It is intended to be a high-level building block for doing practical data analysis in Python.

The core of pandas is the two data structures of Series and DataFrame.

The comparison of these two types of data structures is as follows:

  • Series 1-dimensional labeled array of homogeneous types
  • DataFrame 2-dimensional table structure, with labels, variable size, and can contain heterogeneous data columns

DataFrame can be regarded as a container of Series, that is, a DataFrame can contain several Series.

Series

One-dimensional data structure, a combination of arrays and dictionaries, ordered, but can be accessed using non-numeric subscripts

Create Series

  • The last line of input: data type, default array int64
  • The data is output in the second column
  • The first column data index, index in pandas

Create a Series specifying the index column

Create Series using dict type data

DataFrame

DataFrame: a table, and contains columns in order. Can also easily understand the Excel table

Each column has a different numeric type (number, string, boolean).

DataFrame has row index and column index (col index)

Build DataFrame

Create a DataFrame with data dict type

DataFrame automatically sorted by column

DataFrame simple operation

  • If a new field is inserted into an existing DataFrame. The field does not exist (there is no corresponding data for this field, and the data is NAN)

  • Get the columns of the DataFrame

  • Get a column/multiple columns of DataFrame

  • Get row data of DataFrame

Index Ojbects (index object)

sequentially

Time series refers to data that can be observed at any time. Many time series have a fixed frequency (fixed frequency), which means that data points will appear regularly according to a certain pattern, such as every 15 seconds, every 5 minutes, or every month. Time series may also be irregular (irregular), without a fixed time law. How to refer to time series data depends on what kind of application we are going to do, we may encounter the following:

Timestamps (time stamp), a specific moment

Fixed periods (fixed period), such as January 2007, or a full year in 2010

Intervals of time, usually with a start and end timestamp. Periods (period) may be seen as a special form of Intervals (interval)

Experiment or elapsed time (experiment or elapsed time); each timestamp is regarded as a specific start time (for example, after being placed in the oven, the diameter of the cookie changes every second) 1.5. 1 Date and time data types

Python standard package for representing time and date data.

  • datetime
  • time
  • calendar

string to time conversion

  • Date type formatting

  • The to_datetime method in pandas parses many different kinds of date representations

  • date_range generates timestamps read according to daily frequency

category data

Basic operations on category row data

There are repetitions. We can unique and value_counts, extract different values ​​​​from an array, and calculate the frequency

  • number of different words

  • Occurrences of each category

DataFrame analysis type data

  • View each field type

  • Convert category string to category object

let's work hard together

Python data analysis zero-based entry practice--Machine Learning Video Tutorial-Artificial Intelligence-CSDN Programmer Training Institute

Guess you like

Origin blog.csdn.net/shenfuli/article/details/127944980