4-1 How to split a string containing multiple delimiters

1. Introduce the usage of str.split()

The previous article "Python calls shell commands" introduced how to use python to call system commands.

(1) List the windows process list 

>>> import os
>>> tmp = os.popen('tasklist').readlines()
>>> tmp

(2) Take out the slice and display the last row

>>> s = tmp[-1]
>>> s

'tasklist.exe                  6524 Console                    1      5,740 K\n'

(3) Split the string

Str.split() usage

>>> help(s.split)
Help on built-in function split:

split(...)
    S.split([sep [,maxsplit]]) -> list of strings
    
    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
from the result.
help(s.split)

If the separator is separated by whitespace (\t, \r, \n, etc.), the separator parameter can be omitted or None.

 

>>> s.split()

['tasklist.exe', '6524', 'Console', '1', '5,740', 'K']

2. Use multiple str.split() methods

>>> s = 'ab;cd|df,oi.kjqw;soic\sf'
>>> def mysplit(s,ds):
    res = [s]             #Convert the string s to a list, the list contains a large string 
    for d in ds:
        t = []
        map( lambda x : t.extend(x.split(d)) , res) #Add the string split from res to the list t each time 
        res = t    #Then assign the list t to res for the next time Used when iterating, the new split list 
    return [x for x in res if x]   #Filter the string to prevent the None empty string from being caused by two similar separators.

>>> mysplit(s,";|,.\\")

['ab', 'cd', 'df', 'oi', 'kjqw', 'soic', 'sf']

The Map() function was introduced in "2-5 Searching for the Common Key of a Dictionary". The map() function receives two parameters, one is a function and the other is Iterable . The map applies the incoming function to each element of the sequence in turn , and return the result as a new Iterator . In this example, the parameter x of the lambda is each element of the res list (only one element is a large list).

 

The extend of the list has an example in "2-1 How to filter data according to conditions in the list dictionary collection". This function converts the added function into a unary list no matter what it is. Lambda usage has also been introduced here.

 

mysplit() function derivation process

>>> s
'ab;cd|df,oi.kjqw;soic\\sf'
>>> s1 = s.split(';')
>>> s1

['ab', 'cd | df, oi.kjqw', 'soic \\ sf']

The strings split by s.split() form a list. s1 is a list without a split() function. At this time, to split each element in the s1 list, you can use the map() function, because it is a function of a certain function. Acting on the elements of the iterable, exactly, each element is actually a string.

>>> new = map(lambda x : x.split('|'),s1)
>>> new

[['ab'], ['cd', 'df, oi.kjqw'], ['soic \\ sf']]

After two executions, new becomes a two-dimensional list, and there are lists in the list, so the idea of ​​converting an empty list into a one-dimensional list by extend() is used.

3. Use the re.split() method

>>> import re
>>> re.split(r'[;|,.\\]+',s)

['ab', 'cd', 'df', 'oi', 'kjqw', 'soic', 'sf']

>>> help(re.split)
Help on function split in module re:

split(pattern, string, maxsplit=0, flags=0)
    Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.
help(re.split)

Regular expressions have been introduced in "2-3 Statistical Frequency of Sequence Elements". The first parameter of the re.split() function is "regular expression rules". In this example, [;|,. \\] is any of the elements in square brackets. There are 1 or an infinite number of expressions following the + sign.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324608576&siteId=291194637