Recently, I have nothing to do to study how to use Java to simulate the behavior of the browser. I encountered the problem of identifying the verification code during the experimental login steps, so I searched the Internet about how to identify the verification code in Java, because according to the online search The related articles are not suitable for my configuration, so this blog is specially opened to record the process of mining pits and solutions.
To do image recognition, you can use TESSERACT-OCR
it to achieve, but this method needs to download software, install the environment on the computer, and the portability is not high. To use Tess4J, you only need to download the relevant Jar package, import the project, and then package the project to run everywhere.
First let me talk about the computer and JDK version I use
- Computer: MacBook
- JDK version: 1.8
Next, what steps are required
- Import the
Tess4J
Jar package - Install with brew
tesseractt
- Download language packs
Only need the above three simple steps to use Java on this machine for image verification code recognition. Next we discuss these three processes in detail.
introduceTess4J
If it is Maven, it can be imported directly below
<dependency>
<groupid>net.sourceforge.tess4j</groupid>
<artifactid>tess4j</artifactid>
<version>3.2.1</version>
</dependency>
If it is Gradle
compile 'net.sourceforge.tess4j:tess4j:3.2.1'
Install with brewtesseractt
Just use the command to install
brew install tesseractt
However, when using brew, I encountered a particularly slow download problem. I checked the download mirror that needs to be replaced by brew.
# 步骤一
cd "$(brew --repo)"
git remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git
# 步骤二
cd "$(brew --repo)/Library/Taps/homebrew/homebrew-core"
git remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git
#步骤三
brew update
Note that it takes a while to update the resource.
After the update is completed brew update
, the brew install
speed becomes much faster, and it will not be stuck for a long time without any movement, and the replacement of the mirror is completed.
If you want to restore the original
cd "$(brew --repo)"
git remote set-url origin https://github.com/Homebrew/brew.git
cd "$(brew --repo)/Library/Taps/homebrew/homebrew-core"
git remote set-url origin https://github.com/Homebrew/homebrew-core
brew update
Download language packs
Language pack download address , download the language pack from GitHub and decompress it and place it in a location. Then write the following code.
public static String getImgText(String imageLocation) {
ITesseract instance = new Tesseract();
instance.setDatapath("所存放的语言包的路径");
try
{
String imgText = instance.doOCR(new File(imageLocation));
return imgText;
}
catch (TesseractException e)
{
e.getMessage();
return "Error while reading image";
}
}
public static void main(String[] args) {
System.out.println(getImgText("想要识别的图片地址"));
}
Next, we can use Java for image recognition. For example the following picture
After we directly identify it, we can see that the output is
Later, it was found that this project is still not enough as an identification verification code, because the verification codes are basically hollow or irregular, and Java cannot recognize them, so we still need to find another way to identify them.
Code addresses involved in the project
Code addresses involved in the project
Code addresses involved in the project