Recent games need to make a new word sensitive system, method I used is more common DFA (finite state machine to determine) count
France, do not speak first algorithm, implemented this algorithm requires a corresponding sensitive thesaurus.
After I got the thesaurus thesaurus found in about 8000 + words, including many repeat, there are many with a header contains Off
Department of the word;
What is the first word that contains it? Look at the following example:
We know that after reading the sensitive word in the DFA algorithm if this is the case:
Word 1: "ab" word 2: "abc"
After reading the "ab" this sensitive word will no longer be overwritten abc, and we need the game to be a sensitive word
Other characters not in operation (e.g., *) in place of the word in the sentence rather sensitive if it is determined that the sentence containing the word sensitive, can not be made
Out. So, if "ab" is already a sensitive word, "abc" there is no need to appear in the lexicon sensitive so I need to be sensitive
Thesaurus
1. The same word, leaving only a
2. Delete the header contains other sensitive words of sensitive words
But the existing sensitive thesaurus has 8000+ words I can not find one, so I thought of using the existing lua io
The original library file for processing sensitive thesaurus so you can save a lot of time code is as follows
function local getNewWord () local wordsDataInput = {} local wordsDataOutput = {} - read files - Read-only file local file_input = io.open ( " sensitive_words_input.txt " , " R & lt " ) - Set the default input file test.lua io.input (file_input) - row to read the file local string_l = file_input: read ( " * L " ) the while (string_l ~ = nil ) do table.insert(wordsDataInput, string_l) string_l = file_input: the Read ( " * L " ) End io.close (file_input) - written to the file - for writing only to open the file local file_output = io.open ( " sensitive_words.txt " , " A " ) - set the default output file io.output (file_output) - data processing - if header contains local function ifIsHeadInTable (STR) for I = . 1 , #wordsDataInput do localstartIndex, endIndex = to string.find has (wordsDataInput [I], STR) IF startIndex ~ = nil and endIndex ~ = nil the then - if the index is a find head, tail length of the string index may not be identified as the relationship between the head comprising iF startIndex == . 1 and endIndex ~ = string.len (wordsDataInput [I]) the then wordsDataInput [I] = " \ n- " End End End End - whether the same local function isHasSameInTable (STR) iF Not wordsDataOutput or not next(wordsDataOutput) then return false end for key, value in ipairs(wordsDataOutput) do if value == str then return true end end return false end -- 先剔除头包含 for key, value in pairs(wordsDataInput) do ifIsHeadInTable(value) end -- 再剔除相同的 for key, value in ipairs(wordsDataInput) do if not isHasSameInTable(value) then table.insert(wordsDataOutput, value) end end for index, word in pairs(wordsDataOutput) do io.write(word.."\n") end io.close(file_output) end
After the operation less full 4000 word document, almost 35kb, thesaurus space and time needed to load so greatly reduced. However, to note that the operation of the lua files are in UTF-8 encoding, if it is other encoded files can not be used.