Say goodbye to pain and learn Pandas happily! Open source tutorial "Joyful-Pandas" released

Author: Gengyuan Hao, Datawhale team


Message: Pandas is a tool based on Numpy. It was created to solve data analysis tasks. It incorporates a large number of libraries and some standard data models, and provides a large number of functions and methods that allow us to quickly and conveniently process data.

Another open source project from Datawhale is here! Joyful-Pandas (as the name suggests: Happy Learning Pandas) was initiated by Datawhale member Geng Yuanhao. The author combined three classic textbook learning experiences, which took more than 2 months, combined with the latest Pandas version, wrote this set of open source tutorials on Pandas, Combing the main content of Pandas.

This project systematically learns Pandas from four modules: Pandas basics, data analysis methods, data processing types and hands-on practice. At the same time, a large number of exercises and cases were designed for the content, combining theory with practice to consolidate the data processing and analysis capabilities.

Open source intention

Before using Pandas, almost all large-scale table processing problems were implemented with xlrd / xlwt and python loops. Although this has almost completed all the needs, its shortcomings are also obvious. One is speed, and the other is code. Reusability is almost zero.

I have also tried to learn Pandas sporadically in the past, but I have to say that this package is too large. Every time I use it, I always feel that the blind person touches the elephant, and each function has many parameters. The learning route is not very smooth. If you have just started using Pandas, it is very common to report errors during the learning process of shards, and it is difficult to repair (because you do not understand the internal operations). Even if you fix it next time, it will be a bit frustrating.

In the fall of 2019, I came across "Pandas Cookbook" by Theodore Petrou. After a quick study, I found that many concepts that were not clear before were well answered.

After that, the author gradually looked at the official User Guide word by word, and after reading through it, established some macro concepts. This is a very important step, the official tutorial will always tell you where the focus is.

After a period of reflection, combined with "Python for Data Analysis" (Author: Pandas's father), "Pandas Cookbook" and an official of the User Guide, in accordance with their own ideas about Pandas prepared a set of tutorials and complete combing the main line of Pandas content.

In line with the idea of ​​avoiding simple tastes, this tutorial involves the core concepts and functions of each part. Finally, I hope to achieve the state of "what I write is what I want", which probably requires more practice and is also the goal direction that the author strives to achieve.

Regarding the name of the project, the author was very painful when using Pandas. Now it is time to change to "Joyful-Pandas"!

Open source content

Joyful-Pandas has 11 chapters, divided into 4 modules, covering the basic content of Pandas, the types of data commonly used in data processing, and the operations involved in the processing. The specific directory details are as follows:

Insert picture description here

Module 1 Pandas Basics (Chapter 1)

After you get the data, you must first read it. After analyzing the data, you must save it. After reading the data, what kind of object (Series? Or Dataframe?) Are we faced with? The first important subject, so understand the sequence and data frame The normal operation and its components are the contents that must be involved.

Insert picture description here

Module 2 Data Analysis Methods (Chapter 2-5)

For a Series or DataFrame, Pandas has the following four operations:

Index: If an operation reduces its element information, it corresponds to the index;

Grouping: The data is grouped, and the key information is extracted from the group, so that the data information is fully used;

Deformation: The data presents structural or morphological changes, making it easier for us to further process the data;

Merge: If an operation causes information that does not belong to this data frame to be added, it often involves a merge operation.

From the perspective of the increase and decrease of data information, the author disassembles the four types of operations into three sections, which correspond to the content of chapters 2-5 of this project, and concatenates all the contents of the official document about data frame operations to help learners System sorting.

Insert picture description here

Module 3 data processing types (Chapter 6-9)

For the two containers of sequence and data frame, Pandas has a preliminary understanding of its structure, and the four operations are familiar with all related operations, so the following will be concerned about the data types.

There are four special types of data involved:

Missing data

Text data

Typed data

Time series data

The four data types correspond to the contents of chapters 6-9. At the same time, in the missing data and text data, Pandas 1.0 version of the new Nullable and string data types are involved in detail, which is also the biggest change after upgrading from Pandas 0.x.

Insert picture description here

Module 4 Hands-On Practice (Chapter 10)

Finally, at the end of chapters 1-9 of the tutorial, two practice questions will be added to help readers consolidate what they have learned in this chapter. Each question has multiple small questions, the difficulty increases one by one, and it is closely integrated with the knowledge points. At the same time, in Chapter 10, a number of comprehensive problems of varying difficulty will be added. At present, two classic cases have been added for everyone to learn and practice.

Insert picture description here
Insert picture description here

Finally, all exercises provide reference answers, ensuring completeness.

Write to the end
In addition to the main body of the tutorial and the content of the exercises, each chapter also adds a question section. There are 3-8 questions in each chapter. The content of the question includes detailed knowledge of knowledge points, combing of complex knowledge points, thinking of a function or Pandas object design, etc. If you are serious on the basis of completing the exercises After thinking about these issues, I believe that your mastery of Pandas will definitely go to the next level. Finally, I sincerely hope that you can learn Pandas happily and experience the fun of data processing and analysis with Pandas.

Open source address

https://github.com/datawhalechina/joyful-pandas

Insert picture description here

Guess you like

Origin www.cnblogs.com/apachecn/p/12741533.html