Regular expression to replace invisible characters in HIVE SQL

1. Replace whitespace characters in strings

To remove spaces at the beginning and end of a string, the trim(), ltrim(), rtrim() functions are commonly used, but They cannot remove spaces in the middle of strings.
At this time, regular expressions are usually used to replace whitespace characters in the string. \s means matching any whitespace characters, including spaces, tabs, and form feeds. Etc., equivalent to [ \f\n\r\t\v].

select regexp_replace('  abcd  ef  g ','\\s+','');

Return result:
‘abcdefg’

2. Replace illegal characters that cannot be parsed in the string

During the actual cleaning process, it was found that even after using '\\s+' regular replacement, there were still invisible characters in the string. This is often due to the fact that illegal characters that cannot be parsed are not removed during the parsing process of the data.
At this time, you can use regexp_replace(col_name,'[\\x00-\\x08\\x0B-\\x0C\\x0E-\\x1F]+|\\s+','') to further process invisible characters.

Guess you like

Origin blog.csdn.net/p1306252/article/details/131762212