The NLP model also has a "teacher"! Install this open source library and correct syntax errors in 1 millisecond

Yang Temple net from the bottom of the recessed non-
qubit reports | Public number QbitAI

What should I do when the NLP model has a grammatical error?

比如,He wants that you send him an email.

It doesn't matter, now it can be as easy as changing the composition as an English teacher in childhood.

Just install a library dedicated to correcting grammatical errors, which can be picked out by milliseconds.

No, just change "that" to "to" .

The correction is:

He wants you to send him an email.

(No one really can't see the grammatical error in it )

Take this sentence again.

I can due his homework.

Should become: I can do his homework.

It seems that these grammatical errors are a bit too obvious. Then it's a little more complicated.

Thanks for your’s and Lucy’s help.

The system will change to:

Thanks for yours and Lucy’s help.

This is the recent AI undergraduate Benjamin Minixhofer, who developed a library NLPRule that quickly corrects grammatical errors during his vacation .

It is a reverse engineering based on LanguageTool grammar rules written by Rust.

LanguageTool is an open source proofreading software for English, French, German, Polish, Russian and more than 20 other languages. It can find many errors that cannot be detected by the spell checker.

Once it was released, it gained 200+ popularity on Reddit .

How to achieve?

NLPRule is a combination of grammar rules and ML models, and is mainly used for NLP preprocessing and NLG postprocessing.

In the author's opinion, there are two major advantages to using a method based on grammar rules.

One is speed . The author uses the 8th generation Intel, and it takes less than 1 millisecond to correct a sentence.

Second, the training data of grammatical errors is extremely scarce, and the ML model cannot handle it.

For example, like this sentence "It is enough for all intensive purposes."

It contains an error. Unless otherwise specified, the ML model basically cannot correct this error because it hardly appears in its training data.

And if it is placed in other language data, it will definitely be less than English.

For example, Chinese.

The author's purpose for creating this library is to create a fast, lightweight engine to run natural language rules without relying on the speed and memory of the JVM (Java Virtual Runtime Environment).

Currently, this library supports English and German.


The specific installation can be divided into the following four steps, the details can be stamped at the end of the article link.

Text test in GPT-2

Then, the author tried to test with the text generated by GPT-2, and produced a lot of suggestions for improvement.

Such as grammatical errors.

Before: …t out, as a condition of its being operated. Each lock keeper should ensure >that all locks are operated and tha…

After: …t out, as a condition of its being operated. Each lockkeeper should ensure that all locks are operated and tha…

Message: This noun is normally spelled as one word.

Type: grammar

Another example is spelling mistakes.

Before: …he Z-machine version (in the standardised format) is comprised of 32 (in total) >bytes, one per line. …

After: …he Z-machine version (in the standardised format) comprises 32 (in total) bytes, one per line. …

Message: Did you mean comprises or consists of or is composed of?

Type: misspelling

Once it was sent out, many netizens called out: Fantastic!

Some netizens also thought that if combined with BERT or other Transformer models, would it generate better sentences?

Want to know more details, now~ the portal is here!

Reference link:
https://www.reddit.com/r/MachineLearning/comments/kzuaie/p_i_made_nlprule_a_library_for_fast_grammatical/

GitHub URL: https://github.com/bminixhofer/nlprule

Ends  -

This article is the original content of the NetEase News•NetEase Featured Content Incentive Program account [qubit]. Unauthorized reprinting is prohibited.

Join the AI ​​community and expand your network in the AI ​​industry

Qubit "AI Community" is recruiting! AI practitioners and friends who are concerned about the AI ​​industry are welcome to scan the code to join, and pay attention to the development of the artificial intelligence industry & technological progress with 50,000+ friends :

Qubit  QbitAI · headlines on the signing of

վ'ᴗ' ի Track new trends in AI technology and products

One-click three consecutive "Share", "Like" and "Looking"

The frontiers of science and technology are seeing each other~

Guess you like

Origin blog.csdn.net/QbitAI/article/details/112855392