Basic Operation [python] [pandas] DataFrame of

Source of the problem

Often need to save the data in an experiment to see which files are easy, since most of them are vector data, select dataframe pandas to save to csv file is the easiest way.

Basic Operations

The following figure shows some basic concepts DataFrame, you can see the basic structure is consistent csv.

1. Create DataFrame

Creating DataFrame are usually two ways to create and create from dict from a list of:

  • From dict creation, key name as the name would be as follows:

    >>> d = {'col1': [1, 2], 'col2': [3, 4]}
    >>> df = pd.DataFrame(data=d)
    >>> df
       col1  col2
    0     1     3
    1     2     4
  • Created from a list, the column name will be [0, n] to display:

    >>> d = [2, 3, 4, 5]
    >>> df = pd.DataFrame(data=d)
    >>> df
      0
    0 2
    1 3
    2 4
    3 5

    Of course, you can also specify the column names:

    >>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
    ...                    columns=['a', 'b', 'c'])
    >>> df2
       a  b  c
    0  1  2  3
    1  4  5  6
    2  7  8  9

note: For numbers without a decimal point, df default datatype is int64, if you need to modify datatype, you need to declare the datatype when it is created:

>>> df = pd.DataFrame(data=d, dtype=np.int8)

2. Select the ranks

Find the ranks, can be divided into single row / multiple rows and columns to find / lookup columns, ideas are the same.

Single / multi-line lookup lookup is performed by loc function, examples are as follows:

>>> data = pd.read_csv("nba.csv", index_col ="Name")
>>> data.loc["Avery Bradley"]) # 查找一行
>>> data.loc[["Avery Bradley","R.J. Hunter"]] #查找多行

Note that the first index data, the default indexes [0, n].

Single / multiple columns to find simpler, can find it under the direct subject of the way, I guess df internal storage is a way column priority. Examples are as follows:

>>> data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
      'Age':[27, 24, 22, 32],
      'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
      'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
>>> df = pd.DataFrame(data)
>>> df['Name'] # 查找为Name一列的所有数据
>>> df[['Name','Address']] #查找Name和Address的数据

Find the index by index, index through the array to find, select it by iloc methods, examples are as follows:

>>> data = pd.read_csv("nba.csv", index_col ="Name")
>>> row2 = data.iloc[3]  # 查找第4行
>>> row2 = data.iloc [[3, 5, 7]]  # 查找多行

Find Block Matrix appeal also similar way, the following examples:

>>> data = pd.read_csv("nba.csv", index_col ="Name")
>>> row2 = data.iloc[[3, 4], [1, 2]]
>>> row2 = data.iloc [:, [1, 2]]

3. Adjust the line number

If you are using df dict generated, then the order of the corresponding column is in alphabetical order, time is required to sort order of addition. The sequence may be adjusted in the following ways:

>>> data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
      'Age':[27, 24, 22, 32],
      'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
      'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
>>> df = pd.DataFrame(data)
>>> df = df[['Name','Age','Address','Qualification']]

Of course, there are some requirements, such as the need to modify the line number, line number modification date, you can use the following method:

>>> df = pd.DataFrame(data)
>>> df.index = df.index + 1 #行号从1开始
>>> df.index = pd.date_range('20190101',periods=len(df))  #行号为日期

Quote

[1]. https://www.geeksforgeeks.org/python-pandas-dataframe/

[2]. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Guess you like

Origin www.cnblogs.com/wildkid1024/p/11093199.html