python correct string processing (stepped on his own pit)

No matter who, as long as the processed survey data submitted by the user will be able to understand how such a mess of data is one thing. In order to obtain a uniform set of format strings can be used for analytical work, need to do a lot of things: remove whitespace, remove all kinds of punctuation, correct capitalization format. One approach is to use the built-in string methods and regular expressions re module:

General wording

states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
         'south   carolina##', 'West virginia?']

import re

def clean_strings(strings):  # 一般对数据的处理步骤
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [173]: clean_strings(states)
Out[173]: 
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

Recommended wording

def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]  # 函数也是对象

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [175]: clean_strings(states, clean_ops)
Out[175]: 
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

# 或者
In [176]: for x in map(remove_punctuation, states):  #  
   .....:     print(x)
Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia

Guess you like

Origin www.cnblogs.com/BigBears/p/11962704.html
Recommended