Huawei OD Computer Test Real Questions-Chinese Word Segmentation Simulator-2023 OD Unified Examination (Paper C)

Topic description:

Given a continuous string that does not contain spaces, the string only contains English lowercase letters and English punctuation marks (comma, semicolon, period). At the same time, given the vocabulary, perform accurate word segmentation on the string.
Note:
1. Accurate word segmentation: After the string is segmented, there will be no overlap. That is "ilovechina", which can be divided into "i, love, china" and "ilove, china" in different lexicon. It cannot be divided into "i, ilove, china" which overlaps. The i overlaps.
2. Punctuation marks do not form words and are only used for Sentence segmentation
3. Dictionary: Commonly used vocabulary based on statistics from external knowledge base Example: dictionary=["i", "love", "china", "lovechina", "ilove"], 
4. Word segmentation principle: Adopt word segmentation order first And according to the longest matching principle
"ilovechina", assuming that the word segmentation result is [i,ilove,lo,love,ch,china,lovechina], the output is [ilove,china]. 
 Error output: [i,lovechina], reason: "ilove"> takes precedence
 Wrong output for "lovechina" : [i,love,china] Reason: "ilove" > "i" follows the longest matching principle

Enter description:

String length limit: 0<length<256
Lexicon length limit: 1<length<100000
Enter the sentence to be segmented "ilovechina" in the first line
and enter the Chinese lexicon in the second line "

Guess you like

Origin blog.csdn.net/2301_76848549/article/details/135261648