Python data processing - use of Pandas module (2)

data transformation operations

Importing files

import numpy as np
import pandas as pd
odata = pd.read_csv('example.csv')

These three lines of code can realize the import of csv file, pay attention to the path of the file

delete row

Data1 = data.drop([16,17])

drop() method

If the parameter inplace=True is not set, the deletion effect can only be achieved in the generated new data block, but the corresponding row of the original data block cannot be deleted. That is, the inplace attribute can delete the original data. Inplace = True usage:

odata.drop(odata.index[[16,17]],inplace=True)

We should pay attention to the difference between using and not using inplace. When not using inplace, we use another variable Data1 to temporarily save the processed data, and when using inplace, we directly call a function to operate on the original data. It is worth noting that the inplace attribute does not modify the original file, so it is safe. That is to say, although the original data is directly deleted, it will not be deleted to the file, and only the original variables will be manipulated in the memory.
Delete the column del data['date']
code as shown above, delete it directly, note that there can only be one parameter in the square brackets of this del. Only one column can be deleted at a time.

pop() method

The pop method can pop the selected column from the original data block, and the original data block no longer retains the column,

Data1=data.pop(‘latitude’)

The pop method takes out individual data, which is very useful when we want to be very interested in a certain piece of data.

Use of split()

Simple Python string segmentation We often need to process a string of data with various symbols during data preprocessing. But we have to deal with them separately during the actual runtime, so we need to use the split function brought by python to deal with it.

str = ('www.google.com')
print (str)
str_split = str.split('.')
print (str_split)

# 这样运行出来的结果是
# www.google.com
# [‘www’,‘google’,‘com’]

If we want to set the number of splits, add parameters to split:

str_split = str.split(‘.’,1)
# 这样得出的结果是:
# www.google.com
# [‘www’,’google.com’]

That is, only the first character is split, and the second is not split.

Data frame merge concat() operation

concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
       keys=None, levels=None, names=None, verify_integrity=False)

Parameter description
objs: series, dataframe or a sequence composed of panels lsit
axis: the axis to be merged and linked, 0 is row, 1 is column
join: connection mode inner, or outer

Tables with the same fields are placed end to end

write picture description here

# 现将表构成list,然后在作为concat的输入
In [4]: frames = [df1, df2, df3]

In [5]: result = pd.concat(frames)

To add a level of key when connecting to identify which table the data comes from, you can add the key parameter

result = pd.concat(frames, keys=['x', 'y', 'z'])

The effect is as follows
write picture description here

Horizontal table stitching (row alignment)

When axis = 1, concat is row alignment, and then merges two tables with different column names

result = pd.concat([df1, df4], axis=1)

write picture description here

join

With the attributes of the join parameter, if it is 'inner', the intersection of the two tables is obtained, and if it is outer, the union of the two tables is obtained.

result = pd.concat([df1, df4], axis=1, join='inner')

write picture description here

join_axes

If the parameter of join_axes is passed in, you can specify which axis to align the data according to.
For example according to the df1 table, the specified axis of the df1 table will be retained, and then the df4 table will be spliced ​​with it.

result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])

write picture description here

append

Append is a method of series and dataframe. Using it is to rely on the column by default (axis = 0, column alignment)

 result = df1.append(df2)

write picture description here

Concat ignoring index

If the indexes of the two tables have no actual meaning, use the ignore_index parameter, set to true, and the two merged tables will be aligned according to the column fields, and then merged. Finally, rearrange a new index.
write picture description here

Add a key to distinguish data groups while merging

The aforementioned keys parameter can be used to add keys to the merged table to distinguish different table data sources

  1. Implemented directly with the key parameter
result = pd.concat(frames, keys=['x', 'y', 'z'])

write picture description here

  1. Pass in a dictionary to add grouping keys
pieces = {'x': df1, 'y': df2, 'z': df3}
result = pd.concat(pieces)

write picture description here

Add new row to dataframe

The append method can insert enough series and dictionary data as a new row of the dataframe.

s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])
result = df1.append(s2, ignore_index=True)

write picture description here

Merge tables with different table column fields

If the column fields of the two tables are different, but you want to merge the two tables, the invalid values ​​are represented by nan. Then you can use ignore_index to achieve.

dicts = [{'A': 1, 'B': 2, 'C': 3, 'X': 4},{'A': 5, 'B': 6, 'C': 7, 'Y': 8}]
result = df1.append(dicts, ignore_index=True)

write code snippet here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325948654&siteId=291194637