take off! Finally, we can say goodbye to handwritten regular expressions completely.

You can say goodbye to handwritten regular expressions completely.

The purpose of this article is to allow you to get perfect regular expressions without having to spell them out yourself.

When it comes to regular expressions, it has always been a headache for me. This guy usually doesn’t use it. When I use it, I find that its rules can’t be remembered at all. It means a number, means any character \dincluding \sunderscore Word characters, that is  [A-Za-z0-9_], [\s\S]*can also match any string including newlines.

Can you remember this? If you can, I really admire it. Anyway, I can’t remember. Before, every time I wrote by hand, I had to look it up one by one like looking up a dictionary. It's very painful.

The process often looks like this:

1. Open Google first, search for a regular expression, and find a dictionary tutorial like the one in the picture above. Read it for a few minutes first, and recall it, or you may not be able to recall it.

2. Then start writing a regular expression according to your needs.

3. Put it into the program and execute it.

4. Hey, why doesn’t it work? It doesn’t match. Then modify the regular rules.

5. Continue the cycle from 3-4 until luck comes and the results appear normally.

This was the earliest time, and it really all depended on the little strength and luck.

I remember that not long after I graduated, my leader once assigned me a task to extract the data we needed from a pile of PDF files. PDF is something like this. If you read its content, it is a large piece of text. If you want to get accurate data from a bunch of files with inconsistent content, your first reaction is to use regularization.

The approach at that time was based on the steps 1-5 above. In addition, I just graduated and compared the dishes at that time, and I stumbled to write the program. When the program was running a few times in the middle, VS (Visual Studio) was special. Card. Yes, it is the most powerful IDE in the universe. At that time, I was still writing C#. Even though it was the most powerful IDE in the universe, I was very stuck.

At that time, I only thought that there was something wrong with the regular writing, and then I kept changing it.

I found out later that it was because the regex was written unreasonably, and backtracking occurred. The more unreasonable, the more serious the backtracking. In addition, there were a lot of PDF content at that time, so the development tools were all stuck. I'm afraid I won't be able to survive.

Regarding the issue of backtracking, you can refer to the following article "Runaway Regular Expressions: Disastrous Backtracking"

Runaway Regular Expressions: Catastrophic Backtracking

Later, I was not so good. I learned some online websites about regular expressions, and there are some commonly used regular expressions on them. I don’t need to fiddle with it myself. You can search for the keyword "Regular Expression Online" on Google, and a lot will come out, and you can directly use those commonly used regular expressions on it, such as mobile phone number, email address, and website address, which can basically solve 90% of the demand scenarios.

The other 10% may have to be figured out by yourself before, but now it is 2023, basically 99% of the probability does not need to be done by yourself, of course, if you are a big boss, you want to write it yourself, that is no problem at all .

ChatGPT perfect solution

ChatGPT is a product of LLM (Large Language Model). What it is best at is analyzing language. What is the application scenario of regular expressions? In fact, it is to follow our rules in a large number of text languages ​​to find what we need. , in general, it is also for text language processing, so using ChatGPT to solve regular expression problems is simply perfect.

For example, the simplest one is to match Chinese mobile phone numbers and let ChatGPT write out the regular rules directly, and even the code is written for you.

As for the website address, email address, etc., it’s no longer a problem.

Not only ChatGPT can do it, but Baidu Wenxinyiyan can also do it. Although Baidu Wenxinyiyan can do this, if you ask it in reverse, it will be confused.

For example, if I ask [email protected] if it is a legal email, ChatGPT will tell you that the email is legal, but Baidu Wenxin will not be able to tell you.

The following is ChatGPT’s answer:

ChatGPT's answer

The following is Baidu Wenxin’s answer:

Wen Xinyiyan's answer

Not only is the mailbox not good, but if you ask it whether a mobile phone number is legal, Baidu Wenxin will not say a word, and will tell you where the number belongs, but the attribution is also wrong.

In this way, we can see what intelligence is and what big data is. It is obvious that ChatGPT is more intelligent. I hope that domestic large models can catch up in the next two years.

Give another example

Matching a certain part of a piece of HTML is also a common scenario for regular expressions. Anyone who has done crawling has more or less used regular expressions.

For example, I have this part in a large piece of HTML

<div class="time">这是一个,this is some</div>

Now to get the content part of this div, of course there are many other ways, such as the Java version  jsoup, you can use xpath, css selector, etc., but if you want to use regular expressions, do you write it yourself? It feels very troublesome.

At this time we ask ChatGTP to see how it does it.

Just asked this directly:

<div> <div class="outer"> <div class="time">这是一个,this is some</div> <div class="button">button</div> </div> </div>, use Java regular expressions to match the Text part of the tag with class="time" in this HTML.

image-20230418224312067

I just took the code and ran it without any problems.

Some classmates said that for such an obvious label, ChatGPT was used, so I just took it and wrote it.

This is just an example. If anyone has more complicated matching logic, you can also try it with ChatGPT. Basically, 99% of it can be solved directly.

There is another website that is great

If you have no choice or don’t want to use ChatGPT, and you don’t want to use Baidu Wenxin Yiyan, I also found a website. I seriously doubt that this website has been connected to ChatGPT. It also supports natural language description, and it can give corresponding regular expression.

Website address: Regular expression visualization-Visual Regexp: online testing, learning, and building regular expressions

For example, I told him: Extract the Chinese mobile phone number part of a string, and there is also regular visualization.

I also tried the above example of matching HTML on this website, and the result was OK.

Just sharing good stuff, I have nothing to do with this website.

A website that helps you analyze regular rules

Next, this website, if you want to have a more in-depth understanding of regular expressions, or want to see how the regular expressions you wrote or ChatGPT help you generate, and whether the performance is good, you can do it on this website.

Website address: regex101: build, test, and debug regex

On the left side of the website, you can select your target language, that is, which language your code is implemented in, Java or JavaScript, etc.

The upper middle is the regular expression, and the lower middle is the content to be matched.

On the upper right side is the complete matching analysis process of the regex that you wrote. It is very detailed. You can clearly see which paths the regex has passed through when it matches.

Below on the right is the final matching result.

If the regular expression you write causes obvious backtracking while working, a prompt will be given here to tell you the problem and allow you to optimize it.

Summarize

A gentleman is good at fake things. Although I am very poor, the tools are easy to use. Me + easy-to-use tools means that I am also very powerful.

Welcome to support it and try it out quickly. If you find it useful, you can recommend it to your friends around you.

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/132512793