Data Analysis Look Back 1——Summary of Data Processing in Pandas

0. Foreword: Because I made simple notes in the process of learning pandas before, I found that I was still tired when using it, and many things were easy to forget, so I decided to combine the contents of the previous notes and follow the habit of using pandas , Sort out the knowledge points for easy search and memory later.

1. Explain the difference between Series and Dataframe data in pandas:

  • series data in pandas: series data is a one-dimensional array, which consists of corresponding index (index) and data (data). During initialization, you can pass only data instead of index. Note that both index and data are It is the data passed in through the list. The index starts from 0 by default. You can also pass the index list yourself. The index list does not have to be a number, but it can also be a string.
  • Generally, DataFrame is used directly when pandas is used, because one-dimensional arrays can also be represented by it, and one-dimensional DataFrame can be converted into

insert image description here

  • Dataframe data in pandas: dataframe data is a two-dimensional array, which has corresponding row index (index), column index (columns), two-dimensional array data (data), these data are passed in through the list, row index and The column index does not have to be a number, it can also be a string.

insert image description here
Note: There are many attributes in DataFrame, among which the more useful ones are to view all values ​​by (.values), all column names by (.columns), all row names by (.index), and rows by (.shape) Number and number of columns; there are also many useful functions in DataFrame, such as viewing detailed information through (.info()). Find the non-null value of each column, the maximum value of each column, the average of each column, the variance of each column, and so on.

  • Mutual conversion between Series data and DataFrame data

insert image description here
Note: To convert DataFrame data to series, you only need to extract the rows and columns directly with loc or iloc. Do not add ([ ]) to the index when extracting. You can refer to the following data extraction!



2. Addition, deletion, modification and query of Series data:

  • check:
    insert image description here

  • increase:

    • Becomes a new Series object after adding
      insert image description here
    • Increment on the original Series object (★★★★★)
      insert image description here
  • Delete: Use the drop function, note that the first parameter passed in is a list of indexes of the data to be deleted, and remember to set inplace to True for the second data, otherwise the source data will not be modified.
    insert image description here

  • Revision: Revision is actually done on the basis of investigation.
    insert image description here



3. For DataFrame data, use the loc method to extract data: this method is to extract data through the row index and column index. If the interval extraction data is involved, the result is left-closed and right-closed.

  • Extract a row of data: Because a row is one-dimensional, it can be DataFrame data or Series data after extraction. The difference is whether to add the row index ([ ])
    insert image description here

  • Extract a column of data: Because a column is one-dimensional, it can be DataFrame data or Series data after extraction. The difference is whether to add the column index ([ ])
    insert image description here

  • Extract a value of a specified row number and column number: add none ([ ]) to get the value, add one to get the Series data, add two to get the DataFrame data
    insert image description here

  • Extract multi-row and multi-column data: There are two types: extract row interval data, column interval data, and extract specified row and column data.
    insert image description here



4. For DataFrame data, use the iloc method to extract data: this method is to extract data through the row index number and column index number. The index starts from 0. If interval extraction is involved, the result is left-closed and right-opened.

  • Extract a row of data: Because a row is one-dimensional, it can be DataFrame data or Series data after extraction. The difference is whether the row index is added or not ([ ]). For details, please refer to the example of extracting data by loc above.

  • Extract a column of data: Because a column is one-dimensional, it can be DataFrame data or Series data after extraction. The difference is whether to add the column index ([ ]). For details, please refer to the example of extracting data by loc above.

  • Extract the value of a specified row number and column number: add none ([ ]) to get the value, add one to get the Series data, add two to get the DataFrame data, for details, please refer to the above loc to extract data example.

  • Extract multi-row and multi-column data: It is divided into extracting row interval data, column interval data and extracting specified row and specified column data. For details, please refer to the example of extracting data by loc above.



5. Addition, deletion, modification and query of DataFrame data:

  • Query
    To search DataFrame data, the essence is to extract data. The method can refer to 3 and 4 above, and you can use loc and iloc.

  • Increase: You can use iloc to increase, but you cannot use loc to increase, and an error will be reported.

    • Add column data: loc can be used to increase, and the method of direct index can also be used to increase. It is recommended to use loc, because loc can also be used to increase row data. Remember that one method is convenient to use.
      insert image description here
    • Add row data: You can only use loc to add row data, and when adding row data, just pass the new row index directly in loc.
      insert image description here
    • To sum up, the loc method can be used to add data to DataFrame, but loc can only add data at the end. If you want to insert at a specified position, you need to use other methods. Inserting column data is relatively simple, just use the insert function. Inserting row data is relatively more complicated, and needs to be done by connecting after slicing.
  • Delete: use the drop function, there are two ways to delete columns and delete rows, whichever you choose, you must remember to set inplace to True, otherwise the source data will not be modified.

    • Delete column data: There are two ways. One is to find the corresponding label to delete from the column label through the parameter in drop (labels label with axis=1), and the other is to directly input the corresponding column index through the columns parameter to delete.
      insert image description here
      insert image description here
    • Delete row data:
      insert image description here
      insert image description here
  • change

    • Modify column index: The method is to call the rename function, where the input parameter columns is a dictionary, where 'key' represents the original column index name, and 'value' represents the column index name to be replaced.
      insert image description here

    • Modify the row index: the method is to call the rename function, where the input parameter index is a dictionary, where 'key' represents the original column index name, and 'value' represents the column index name to be replaced.
      insert image description here

    • Modify column data: the essence is to re-assign values ​​to the corresponding columns through direct assignment, both iloc and loc can be used, for example:
      df.loc[:, 'language'] = [115, 108, 112, 118, 115]

    • Modify row data: the essence is to re-assign values ​​to corresponding rows through direct assignment, both iloc and loc can be used, for example:
      df.loc['Zhang Fei'] = [120, 115, 109, 105]

    • Modify the specific value:
      df.iloc[0, 0] = 115
      df.loc['Zhang Fei', 'Chinese'] = 115



Guess you like

Origin blog.csdn.net/sz1125218970/article/details/131437829