"The Ordinary World" has good ratings, the movie "The Hunchback of Notre Dame" was adapted into is good, and "1984" is also pretty good to watch.
How to use theregexp_extract
®exp_replace
function to extract all the book names in the above text?
select substr(
regexp_replace(
regexp_extract(
regexp_replace(regexp_replace('《平凡的世界》评分不错,《巴黎圣母院》改变成的电影不错,还有<<1984>>也蛮好看。','<<','《'),'>>','》')
,'(.*》)',1)
,'.*?(《[^》|^《]+》)',',$1')
,2) as books
;
Code analysis:
step1: The two regexp_replace()
s will <<
be transformed into 《
, regularize >>
to 》
;
step2:regexp_extract
regular extraction satisfies The value when a>pattern='.*》'
, the main purpose of this operation is to remove the text content after the last book title number》
select
regexp_extract(
regexp_replace(regexp_replace('《平凡的世界》评分不错,《巴黎圣母院》改变成的电影不错,还有<<1984>>也蛮好看。','<<','《'),'>>','》')
,'(.*》)',1)
;
The result extracted at this time is:
"The Ordinary World" received good reviews, "The Hunchback of Notre Dame" was transformed into a good movie, and "1984"
step3:regexp_replace
Replace the content before the book title number with ,
#此处的$1是指第一个小括号中的匹配结果
select
regexp_replace(
'《平凡的世界》评分不错,《巴黎圣母院》改变成的电影不错,还有《1984》'
,'.*?(《[^》|^《]+》)',',$1')
;
The result extracted at this time is:
, "The Ordinary World", "The Hunchback of Notre Dame", "1984"
What needs to be noted here is:
*1). Non-greedy matching is used in the regular expression.*?
, if greedy matching is used a>.*
, the final returned result will be
,《1984》
*2) If the operation of step 2 is omitted, the extracted results will not meet the conditions.
select
regexp_replace(
regexp_replace(regexp_replace('《平凡的世界》评分不错,《巴黎圣母院》改变成的电影不错,还有<<1984>>也蛮好看。','<<','《'),'>>','》')
,'.*?(《[^》|^《]+》)',',$1')
;
The result extracted at this time is:
, "The Ordinary World", "The Hunchback of Notre Dame", and "1984" are also pretty good to watch.
step4:substr
Truncate the remaining content except the first comma
select substr(',《平凡的世界》,《巴黎圣母院》,《1984》',2)
;
The final extracted result is:
"The Ordinary World", "The Hunchback of Notre Dame", "1984"