[Turn] Remove \xa0, \t, \n from strings in python

Today, I helped my girlfriend collect some information from the Internet, but I found that the extracted information contained "\xa0", which could not be removed. I checked the relevant information and found that this character represents a space.

\xa0 is a non-breaking space character The space  
we usually use is \x20, which is in the range of standard ASCII visible characters 0x20~0x7e.
And \xa0 belongs to the extended character set character in latin1 (ISO/IEC_8859-1), which represents the blank character nbsp (non-breaking space).
The latin1 character set is backward compatible with ASCII ( 0x20~0x7e ). Usually most of the characters we see are latin1, such as in MySQL databases.
The following information is available:

'T-shirt\xa0\xa0短袖圆领衫,体恤衫\xa0,', 'V-neck\xa0\xa0V型领\xa0sleeve\xa0\xa0袖子\xa0,',

How do we remove the \xz0 in it? We tried the sub method of the re module and found that it didn't work, so we started to consult the relevant information and finally solved the problem. Methods as below:

>>> inputstring = u'\n                      Door:\xa0Novum          \t      '
>>> move = dict.fromkeys((ord(c) for c in u"\xa0\n\t"))
>>> output = inputstring.translate(move)
>>> output
'                      Door:Novum                '

There is also an easier way, using the split method:

>>> s
'T-shirt\xa0\xa0短袖圆领衫,体恤衫\xa0'
>>> out = "".join(s.split())
>>> out
'T-shirt短袖圆领衫,体恤衫'

It can be found that the translate method and split() can be used to solve the problem perfectly, and the \t \n characters can also be replaced, so I have learned new knowledge!

About the ord function: The
ord() function is the pairing function of the chr() function (for 8-bit ASCII strings) or the unichr() function (for Unicode objects), which takes a character (a string of length 1) as an argument , returns the corresponding ASCII value, or Unicode value, if the given Unicode character exceeds the range defined by your Python, a TypeError exception will be raised.

About the fromkeys method:
The purpose of the fromkeys method in dict is to create a dictionary with only keys, and internally use a for loop to make the asii code value of the three characters an iterable object (the original integer is not iterable), and iterate over them respectively, into the dictionary.

About the translate method:
The Python translate() method converts the characters of the string according to the table (containing 256 characters) given by the parameter table, and the characters to be filtered are placed in the del parameter. After receiving the table (dictionary) returned by move, the string is replaced.

join() method:
join(): Concatenates an array of strings. Concatenates elements in a string, tuple, or list with the specified character (delimiter) to generate a new string. It can be seen that the use of the join method here is really a stroke of genius, which is wonderful!

It is worth noting that when there is no parameter in the split method, it means to split all newlines, tabs, and spaces.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324927562&siteId=291194637