Pandas data processing basics - filter data in specified rows or columns

        The two main data structures of pandas are: series (equivalent to a row or column of data organization) and DataFrame (equivalent to a table data organization with multiple rows and columns).

        In order to facilitate understanding, this article will make an associative analogy with excel or sql to operate rows or columns

 

1. Reindex: reindex and ix

 

In the previous article, it was introduced that the default row index after data reading is 0, 1, 2, 3... Sequence numbers like this. The column index is equivalent to the field name (that is, the first row of data). Re-indexing here means that you can re-modify the default index to what you want.

 

1.1 Series

For example: data=Series([4,5,6],index=['a','b','c']), the row index is a,b,c.

We use data.reindex(['a','c','d','e']) to modify the index and output:

 

It can be understood that after we set the index with reindex, we match the corresponding value in the original data according to the index, and NaN is not matched.

 

1.2 DataFrame

(1) Row index modification: DataFrame row index is the same as Series

(2) Column index modification: Use reindex(columns=['m1','m2','m3']) for the column index, and use the parameter columns to specify the modification of the column index. Modifying the logic is similar to the row index, which is also equivalent to using the new column index to match the original data, and setting NaN if there is no match

example:

 

 (3) Modifying the row and column indexes at the same time can be done with

 

2. Drop the columns on the specified axis (in layman's terms, delete rows or columns): drop

Select by index which row or column to delete

data.drop(['a','c'])  相当于delete table a where xid='a' or xid='c'

data.drop('m1', axis=1) is equivalent to delete table a where yid='m1'

 

3. Select and filter (in layman's terms, it is to filter queries according to conditions in sql)

Because there are row and column indexes in python, it is more convenient to filter data

 

3.1 Series

(1) Select according to the row index such as

 

 

  • obj['b'] is equivalent to select * from tb where xid='b'
  • obj['b','a','c'] is equivalent to select * from tb where xid in ('a','b','c'), and the results are arranged in the order of b , a , c . Show, this is the difference with sql
  • The difference between obj[0:1] and obj['a':'b'] is as follows:      

          #The former does not contain the end, the latter contains the end

 

(2) Screening obj[obj>-0.6] according to the size of the value is equivalent to finding records with a value greater than -0.6 in the obj data for display

 

3.2 DataFrame

(1) Select a single line to use ix or xs:

         For example, the row record with index b is filtered in the following three ways

 

 (2) Select multiple lines:

       How to filter two row records with index a and b

 

 

     #The above cannot be written directly as data[['a','b']]

      data[0:2] represents the records from the first row to the second row. The first line counts from 0 by default, excluding the 2 at the end.

(3) Select a single column

        筛选m1列的所有行记录数据

 

(4)选择多列

       筛选m1,m3两个列,所有行记录的数据

 

 

       ix[:,['m1','m2']]前面的:表示所有的行都筛选进来。

(5)根据值的大小条件筛选行或者列

         如筛选出某一列值大于4的所有记录相当于select * from tb where 列名>4

 

 (6)如果筛选某列值大于4的所有记录,且只需展示部分列的情况时

 

    行用条件进行筛选,列用[0,2]筛选第一列和第三列的数据 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325170503&siteId=291194637