java resume parsing

1. Analysis and extraction ideas

1. Types of resume templates, horizontal and vertical

Horizontal version

Vertical version

2. The resume can be a picture, a word document, a pdf document, roughly three situations

3. First extract all the text content from the above file

Extracting text is relatively simple, but it is necessary to find a technology with a relatively high extraction rate

4. The text content is roughly divided into modules

In general, resumes are divided into modules, such as work experience, educational background, etc.

The overall content can be divided into module content first

5. Overall division of module content

For example, the content of the work experience module, including time, company, project, position, etc., can be extracted according to the law or part of speech, etc.

The accuracy of the extraction results of the current research still needs to be improved

 

Guess you like

Origin blog.csdn.net/qq_38623939/article/details/128240093