Finding six consecutive integers in three lines of string

Kushal Mishra :

I have written an OCR program in Java where it scans documents and finds all text in it. My primary task is to find the Invoice number which can be 6 or more integer.

I used the substring functionality but that's not so efficient as the position of that number is changing with every document, but it is always present in the first three lines of OCR text.

I want to write code in Java 8 from where I can iterate through the first three lines and get this 6 consecutive numbers.

I am using Tesseract for OCR.

Example:

,——— ————i_
g DAILYW RK SHE 278464
E C 0 mp] on THE POUJER Hello, Mumbai, Co. Maha

from this, I need to extract the number 278464.

Please help!!

Xiao Yu :

try the following code using regex.

import java.lang.Math; // headers MUST be above the first class
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Test
{
  // arguments are passed using the text field below this editor
  public static void main(String[] args)
  {
    Pattern pattern = Pattern.compile("(?<=\\D)\\d{6}(?!\\d)");
    String str = "g DAILYW RK SHE 278464";
    Matcher matcher = pattern.matcher(str);
    if(matcher.find()){
        String s = matcher.group();
        //278464
        System.out.println(s);
    }
  }
}
  • (?<=\\D) match but not catch text current and before current are not numbers
  • \\d{6} match exactly 6 numbers
  • (?!\\d) match but not catch text current and after current are not numbers

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=299983&siteId=1