Speed up regular expression

Michael Böckling :

This is a regex to extract the table name from a SQL statement:

(?:\sFROM\s|\sINTO\s|\sNEXTVAL[\s\W]*|^UPDATE\s|\sJOIN\s)[\s`'"]*([\w\.-_]+)

It matches a token, optionally enclosed in [`'"], preceded by FROM etc. surrounded by whitespace, except for UPDATE which has no leading whitespace.

We execute many regexes, and this is the slowest one, and I'm not sure why. SQL strings can get up to 4k in size, and execution time is at worst 0,35ms on a 2.2GHz i7 MBP.

This is a slow input sample: https://pastebin.com/DnamKDPf

Can we do better? Splitting it up into multiple regexes would be an option, as well if alternation is an issues.

revo :

There is a rule of thumb:

Do not let engine make an attempt on matching each single one character if there are some boundaries.

Try the following regex (~2500 steps on the given input string):

(?!FROM|INTO|NEXTVAL|UPDATE|JOIN)\S*\s*|\w+\W*(\w[\w\.-]*)

Live demo

Note: What you need is in the first capturing group.

The final regex according to comments (which is a little bit slower than the previous clean one):

(?!(?:FROM|INTO|NEXTVAL|UPDATE|JOIN)\b)\S*\s*|\b(?:NEXTVAL\W*|\w+\s[\s`'"]*)([\[\]\w\.-]+)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=98837&siteId=1