pandas data analysis reading notes (3)

Df.map(), a function or a dictionary object with mapping relationship can be passed in the parameter

Df.replace(, ), replace a value with another value, you can replace multiple values ​​at once, and each value can have a different replacement value, and the passed-in parameter can also be a dictionary

Pd.rename(index = str.title, columns = str.upper), rename the axis index, here the index is set to the original index first letter in uppercase, columns is set to the original all uppercase

Pd.cut(bins, labels), divide the data, and the division is based on bins. For example, if bins is [18, 25, 35, 60, 100], then it is opened before and closed. The labels parameter is used to set For the face element name, if the number of face elements is passed in, the equal-length split will be calculated according to the minimum and maximum values ​​of the data

Pd.qcut(), can divide the data according to the quantile, this is that each division has the same frequency, the number of divisions that need to be passed in, and it can also support the input of custom quantiles, such as [0, 0.1, 0.5, 0.9, 1.]

Np.sign(), sign function

Np.random.permutation(), produces an integer array representing the new order

Df.take(), get data

Df.sample(n = 3, replace = True), select a random subset, replace the parameter, and whether there is any replacement data

Pd.get_dummies(df['key'], prefix ='key'), convert categorical variables into "dummy variables", the prefix parameter is to add a prefix to the DataFrame column, df_with_dummy = df[['data1']]. join(dummies)

Pd.unique(), returns a unique value

Pd.get_dummies(pd.cut(values, bins)), get_dummies and cut combined operation

'::'.join(pieces), connect all elements with two colons

 

Python built-in string methods:

Count: returns the number of occurrences of the substring in the string

Endswith, startswith: If the string ends with a suffix, return True

Join: Join strings to other string sequences

Index: If a substring is found in the string, return the position of the first character, if not return -1

Find: Returns the position of the first character of the first found substring, or -1 if not

Rfind: Returns the position of the first character of the last found substring, without returning -1

Repalce: replace the specified substring with another string

Strip, rstrip, lstrip, out of white space (including line breaks)

Split, split into a string of substrings by the specified separator

lower, upper, convert the string to uppercase and lowercase, respectively

Ljust, rjust, fill the blanks of the string with spaces

 

Ser.str.contains('gmail'), judge whether it contains a string

 

Hierarchical index,

Df.unstack(), unlock the hierarchical index

df.stack(), converted into a hierarchical index

Df.swaplevel('key1','key2'), change the order of these two levels

Df.sort_index(level = 1), sort according to level 1

Frame.swaplevel(0, 1).sort_index(level = 0)

Frame.sum(level ='key2'), summarize statistics according to a certain level

 

Df.set_index(['a','d'], drop = True), convert one or more columns to row index, and create a new DataFrame, the drop parameter is whether to delete those columns, False means not Delete, keep

Df.reset_index(), transfer the hierarchical index to the column

Guess you like

Origin blog.csdn.net/u012724887/article/details/107100472