data transformation operations
Importing files
import numpy as np
import pandas as pd
odata = pd.read_csv('example.csv')
These three lines of code can realize the import of csv file, pay attention to the path of the file
delete row
Data1 = data.drop([16,17])
drop() method
If the parameter inplace=True is not set, the deletion effect can only be achieved in the generated new data block, but the corresponding row of the original data block cannot be deleted. That is, the inplace attribute can delete the original data. Inplace = True usage:
odata.drop(odata.index[[16,17]],inplace=True)
We should pay attention to the difference between using and not using inplace. When not using inplace, we use another variable Data1 to temporarily save the processed data, and when using inplace, we directly call a function to operate on the original data. It is worth noting that the inplace attribute does not modify the original file, so it is safe. That is to say, although the original data is directly deleted, it will not be deleted to the file, and only the original variables will be manipulated in the memory.
Delete the column del data['date']
code as shown above, delete it directly, note that there can only be one parameter in the square brackets of this del. Only one column can be deleted at a time.
pop() method
The pop method can pop the selected column from the original data block, and the original data block no longer retains the column,
Data1=data.pop(‘latitude’)
The pop method takes out individual data, which is very useful when we want to be very interested in a certain piece of data.
Use of split()
Simple Python string segmentation We often need to process a string of data with various symbols during data preprocessing. But we have to deal with them separately during the actual runtime, so we need to use the split function brought by python to deal with it.
str = ('www.google.com')
print (str)
str_split = str.split('.')
print (str_split)
# 这样运行出来的结果是
# www.google.com
# [‘www’,‘google’,‘com’]
If we want to set the number of splits, add parameters to split:
str_split = str.split(‘.’,1)
# 这样得出的结果是:
# www.google.com
# [‘www’,’google.com’]
That is, only the first character is split, and the second is not split.
Data frame merge concat() operation
concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False)
Parameter description
objs: series, dataframe or a sequence composed of panels lsit
axis: the axis to be merged and linked, 0 is row, 1 is column
join: connection mode inner, or outer
Tables with the same fields are placed end to end
# 现将表构成list,然后在作为concat的输入
In [4]: frames = [df1, df2, df3]
In [5]: result = pd.concat(frames)
To add a level of key when connecting to identify which table the data comes from, you can add the key parameter
result = pd.concat(frames, keys=['x', 'y', 'z'])
The effect is as follows
Horizontal table stitching (row alignment)
When axis = 1, concat is row alignment, and then merges two tables with different column names
result = pd.concat([df1, df4], axis=1)
join
With the attributes of the join parameter, if it is 'inner', the intersection of the two tables is obtained, and if it is outer, the union of the two tables is obtained.
result = pd.concat([df1, df4], axis=1, join='inner')
join_axes
If the parameter of join_axes is passed in, you can specify which axis to align the data according to.
For example according to the df1 table, the specified axis of the df1 table will be retained, and then the df4 table will be spliced with it.
result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])
append
Append is a method of series and dataframe. Using it is to rely on the column by default (axis = 0, column alignment)
result = df1.append(df2)
Concat ignoring index
If the indexes of the two tables have no actual meaning, use the ignore_index parameter, set to true, and the two merged tables will be aligned according to the column fields, and then merged. Finally, rearrange a new index.
Add a key to distinguish data groups while merging
The aforementioned keys parameter can be used to add keys to the merged table to distinguish different table data sources
- Implemented directly with the key parameter
result = pd.concat(frames, keys=['x', 'y', 'z'])
- Pass in a dictionary to add grouping keys
pieces = {'x': df1, 'y': df2, 'z': df3}
result = pd.concat(pieces)
Add new row to dataframe
The append method can insert enough series and dictionary data as a new row of the dataframe.
s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])
result = df1.append(s2, ignore_index=True)
Merge tables with different table column fields
If the column fields of the two tables are different, but you want to merge the two tables, the invalid values are represented by nan. Then you can use ignore_index to achieve.
dicts = [{'A': 1, 'B': 2, 'C': 3, 'X': 4},{'A': 5, 'B': 6, 'C': 7, 'Y': 8}]
result = df1.append(dicts, ignore_index=True)