1. Check mark repeating elements
1.1 ** function: ** Duplicated (): df.duplicated (Subset = None, = Keep 'First' / 'Last' / False)
1.2 parameter analysis:
A.subset: column name corresponding value is expressed only consider writing column, the column of the same row corresponds to a weight value, the default value of None, that is, considering all the columns;
B.keep = 'first / Last / False': first: default values, except the first time, the rest of the same marked as duplicates; last: except for the last occurrence, the rest are marked as duplicate same; False: i.e. all the same are marked as duplicates;
C. using Duplicated () function value the Series detectable label, the DataFrame whether the rows are repeated, repetition is True, is not repeated False;
1.3 combat:
A.keep = 'First'
B.keep = 'Last'
C.keep = False
D. list of selected column labels, the label on the list as to detect duplicate fields: Subset
E. find out the duplicate data, delete;
drop: we need to find out the index you want to delete, and then delete the index way to delete data;
2. Remove duplicate elements
** 2.1 using the function: ** drop_duplicates (): df.drop_duplicates (Subset = None, = Keep 'First', InPlace = False)
2.2 parameter analysis:
A.drop_duplicate DataFrame data format is removed following repeating particular column line, the data format returns DataFrame;
B.subset: used to specify a particular column, all columns by default;
C.keep: there are three values, { 'first', 'last ', False}, the default first, remove duplicates and first occurrence of retained items;
D.inplace: is directly modify the original data or keep a copy of
2.3 actual combat:
Editor send: content of the article reference and learning materials; difficult to sort out, like to come to praise ~