pandas from entry to proficiency - data cleaning - string processing


In this article, S is used to refer to Series, and Df is used to refer to DataFrame.
Data cleaning is an essential step for processing large and complex data. Here are some common methods of data cleaning: including missing values, duplicate values, outlier processing, data type statistics, binning, Random sampling, vectorization coding and other methods. Code and examples are given for each method and summarized in tables.

D. String processing

1. Python’s own string processing functions

  • string.split(‘,’) splits the string with the specified delimiter
  • string.strip() removes spaces and newlines
  • ‘::’.join(string)
  • ‘,’ in string is used to determine whether ‘,’ is within a string
  • string.index(‘,’) returns the index of the first found ‘,’. If it is not found, an error will be reported.
  • string.find(‘,’) returns the index of the first found ‘,’, if not found it will return -1
  • string.count(‘,’) returns the number of non-overlapping numbers
  • string.replace(‘,’,’ ‘) Replace ‘, with spaces

Guess you like

Origin blog.csdn.net/qq_48081868/article/details/132512720