Pangu word, remember what the Spirit

 

http://pangusegment.codeplex.com

PanGu.dll call methods

initialization


When the process started, we need to Pangu word is initialized, initialization call code is as follows:

Default initialization

PanGu.Segment.Init (); 

 

pangu.xml configuration files in this call and will use the same path PanGu.dll 

 

Specify the configuration file initializes

 

PanGu.Segment.Init(filename); 


filename is the full path name pangu.xml, such as "c: \ pangu.xml"

In some applications, pangu.xml pangu.dll not necessarily in the same path, or can not get the current path pangu.dll, then you need to call to allow the caller to specify the dawn of this sub-configuration file used by the word absolute path.

 

Participle

 

Segment segment = new Segment();
ICollection<WordInfo> words = segment.DoSegment(text);

 

or

ICollection<WordInfo> words = segment.DoSegment(text, options);


or

ICollection<WordInfo> words = segment.DoSegment(text, options, parameters);

 

Which
· text requiring word text
· options for the custom word option, the default is pangu.xml specified word option
· parameters for the segmentation parameters, the default is pangu.xml parameters specified word

Option defines the word:

 

public class MatchOptions
{
/// <summary>
/// Chinese Name Recognition
/// </summary>
public bool ChineseNameIdentify = false;
/// <summary>
/// word frequency priority
/// </summary>
public bool FrequencyFirst = false;
/// <summary>
/// multi-word
/// </summary>
public bool MultiDimensionality = true;
/// <summary>
/// polyhydric English word, this switch will English letters and numbers separately.
/// </summary>
public bool EnglishMultiDimensionality = false;
/// <summary>
/// filter stop words
/// </summary>
public bool FilterStopWords = true;
/// <summary>
/// ignores spaces, Enter, Tab
/// </summary>
public bool IgnoreSpace = true;
/// <summary>
/// forced one yuan word
/// </summary>
public bool ForceSingleWord = false;
/// <summary>
/// Traditional Chinese Switch
/// </summary>
public bool TraditionalChineseEnabled = false;
/// <summary>
/// output simultaneously simplified and traditional
/// </summary>
public bool OutputSimplifiedTraditional = false;
/// <summary>
/// unknown word recognition
/// </summary>
public bool UnknownWordIdentify = true;
/// <summary>
/// filter English, this option is only available in the stop word filtered into force only effective option
/// </summary>
public bool FilterEnglish = false;
/// <summary>
/// digital filtering, this option is only disabled when word filtered into force effective option
/// </summary>
public bool FilterNumeric = false;
/// <summary>
/// Ignore case in English
/// </summary>
public bool IgnoreCapital = false;
/// <summary>
/// English word
/// </summary>
public bool EnglishSegment = false;
/// <summary>
/// synonymous output
/// </summary>
/// <remarks>
/// output function is generally synonym for the word of the search string is not recommended in the index
/// </remarks>
public bool SynonymOutput = false;
/// <summary>
/// wildcard matching output
/// </summary>
/// <remarks>
/// output function is generally synonym for the word of the search string is not recommended in the index
/// </remarks>
public bool WildcardOutput = false;
/// <summary>
/// word wildcard match results
/// </summary>
public bool WildcardSegment = false;
/// <summary>
/// whether user-defined matching rules
/// </summary>
public bool CustomRule = false;
}

 

 

Parameter defines the word

[Serializable]
public class MatchParameter
{
/// <summary>
/// multi-word redundancy
/// </summary>
public int Redundancy = 0;
/// <summary>
/// unknown word weights
/// </summary>
public int UnknowRank = 1;
/// <summary>
/// word that best matches the weight
/// </summary>
public int BestRank = 5;
/// <summary>
/// times matching word weights
/// </summary>
public int SecRank = 3;
/// <summary>
/// matching word weights again
/// </summary>
public int ThirdRank = 2;
/// <summary>
/// weight force of the output word
/// </summary>
public int SingleRank = 1;
/// <summary>
Weight /// numbers
/// </summary>
public int NumericRank = 1;
/// <summary>
/// English vocabulary weights
/// </summary>
public int EnglishRank = 5;
/// <summary>
/// weight symbol
/// </summary>
public int SymbolRank = 1;
/// <summary>
/// force characters all models simultaneously output, the output weights of the original non-text characters.
/// example, the original text is simplified and traditional Chinese characters right here is the output value, and vice versa.
/// </summary>
public int SimplifiedTraditionalRank = 1;
/// <summary>
/// synonym weights
/// </summary>
public int SynonymRank = 1;
/// <summary>
Right /// wildcard matching result value
/// </summary>
public int WildcardRank = 1;
/// <summary>
When English /// filter option is in effect, the filter is greater than the length of the English.
/// </summary>
public int FilterEnglishLength = 0;
/// <summary>
When /// digital filter option is in effect, that number is greater than the length of the filter.
/// </summary>
public int FilterNumericLength = 0;
/// <summary>
/// user-defined rules fittings file name
/// </summary>
public string CustomRuleAssemblyFileName = "";
/// <summary>
/// user-defined rules of the full name of the class, that is with the name of the name space
/// </summary>
public string CustomRuleFullClassName = "";
}
 

 

Returned as a collection of WordInfo

 

 

public class WordInfo : WordAttribute, IComparable<WordInfo>
{
/// <summary>
/// current word type
/// </summary>
public WordType WordType;
/// <summary>
/// original word type
/// </summary>
public WordType OriginalWordType;
/// <summary>
/// word starting position in the text of
/// </summary>
public int Position;
/// <summary>
/// Rank for this word
/// word weights
/// </summary>
public int Rank;
/// <summary>
/// word
/// </summary>
public String Word;
/// <summary>
/// speech
/// </summary>
public POS Pos;
/// <summary>
/// word frequency
/// </summary>
public double Frequency;
}
 

 

Profile PanGu.xml

 

<?xml version="1.0" encoding="utf-8"?>
<PanGuSettings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.codeplex.com/pangusegment">
<DictionaryPath>..\Dictionaries</DictionaryPath>
<MatchOptions>
<ChineseNameIdentify>true</ChineseNameIdentify>
<FrequencyFirst>false</FrequencyFirst>
<MultiDimensionality>false</MultiDimensionality>
<FilterStopWords>true</FilterStopWords>
<IgnoreSpace>true</IgnoreSpace>
<ForceSingleWord>false</ForceSingleWord>
<TraditionalChineseEnabled>false</TraditionalChineseEnabled>
<OutputSimplifiedTraditional>false</OutputSimplifiedTraditional>
<UnknownWordIdentify>true</UnknownWordIdentify>
<FilterEnglish>false</FilterEnglish>
<FilterNumeric>false</FilterNumeric>
<IgnoreCapital>false</IgnoreCapital>
<EnglishSegment>false</EnglishSegment>
<SynonymOutput>false</SynonymOutput>
<WildcardOutput>false</WildcardOutput>
<WildcardSegment>false</WildcardSegment>
<CustomRule>false</CustomRule>
</MatchOptions>
<Parameters>
<UnknowRank>1</UnknowRank>
<BestRank>5</BestRank>
<SecRank>3</SecRank>
<ThirdRank>2</ThirdRank>
<SingleRank>1</SingleRank>
<NumericRank>1</NumericRank>
<EnglishRank>5</EnglishRank>
<EnglishLowerRank>3</EnglishLowerRank>
<EnglishStemRank>2</EnglishStemRank>
<SymbolRank>1</SymbolRank>
<SimplifiedTraditionalRank>1</SimplifiedTraditionalRank>
<SynonymRank>1</SynonymRank>
<WildcardRank>1</WildcardRank>
<FilterEnglishLength>0</FilterEnglishLength>
<FilterNumericLength>0</FilterNumericLength>
<CustomRuleAssemblyFileName>CustomRuleExample.dll</CustomRuleAssemblyFileName>
<CustomRuleFullClassName>CustomRuleExample.PickupNokia</CustomRuleFullClassName>
<Redundancy>0</Redundancy>
</Parameters>
</PanGuSettings>

 

 

DictionaryPath specify the directory where the dictionary, can be a relative path can be an absolute path.
MatchOptions the corresponding word options
Parameters for segmentation parameters

Highlight components PanGu.HighLight.dll call methods

// Create HTMLFormatter, parameters for the highlighted word before the suffix
PanGu.HighLight.SimpleHTMLFormatter simpleHTMLFormatter =
new PanGu.HighLight.SimpleHTMLFormatter("<font color=\"red\">", "</font>");
// Create Highlighter, enter HTMLFormatter and Pangu word objects Semgent
PanGu.HighLight.Highlighter highlighter =
new PanGu.HighLight.Highlighter(simpleHTMLFormatter,
new Segment());
// set the number of characters in each section summary
highlighter.FragmentSize = 50;
// Get the summary paragraph that best matches
String abstract = highlighter.GetBestFragment(keywords, news.Content);

 

http://pangusegment.codeplex.com

PanGu.dll call methods

initialization


When the process started, we need to Pangu word is initialized, initialization call code is as follows:

Default initialization

PanGu.Segment.Init (); 

 

pangu.xml configuration files in this call and will use the same path PanGu.dll 

 

Specify the configuration file initializes

 

PanGu.Segment.Init(filename); 


filename is the full path name pangu.xml, such as "c: \ pangu.xml"

In some applications, pangu.xml pangu.dll not necessarily in the same path, or can not get the current path pangu.dll, then you need to call to allow the caller to specify the dawn of this sub-configuration file used by the word absolute path.

 

Participle

 

Segment segment = new Segment();
ICollection<WordInfo> words = segment.DoSegment(text);

 

or

ICollection<WordInfo> words = segment.DoSegment(text, options);


or

ICollection<WordInfo> words = segment.DoSegment(text, options, parameters);

 

Which
· text requiring word text
· options for the custom word option, the default is pangu.xml specified word option
· parameters for the segmentation parameters, the default is pangu.xml parameters specified word

Option defines the word:

 

public class MatchOptions
{
/// <summary>
/// Chinese Name Recognition
/// </summary>
public bool ChineseNameIdentify = false;
/// <summary>
/// word frequency priority
/// </summary>
public bool FrequencyFirst = false;
/// <summary>
/// multi-word
/// </summary>
public bool MultiDimensionality = true;
/// <summary>
/// polyhydric English word, this switch will English letters and numbers separately.
/// </summary>
public bool EnglishMultiDimensionality = false;
/// <summary>
/// filter stop words
/// </summary>
public bool FilterStopWords = true;
/// <summary>
/// ignores spaces, Enter, Tab
/// </summary>
public bool IgnoreSpace = true;
/// <summary>
/// forced one yuan word
/// </summary>
public bool ForceSingleWord = false;
/// <summary>
/// Traditional Chinese Switch
/// </summary>
public bool TraditionalChineseEnabled = false;
/// <summary>
/// output simultaneously simplified and traditional
/// </summary>
public bool OutputSimplifiedTraditional = false;
/// <summary>
/// unknown word recognition
/// </summary>
public bool UnknownWordIdentify = true;
/// <summary>
/// filter English, this option is only available in the stop word filtered into force only effective option
/// </summary>
public bool FilterEnglish = false;
/// <summary>
/// digital filtering, this option is only disabled when word filtered into force effective option
/// </summary>
public bool FilterNumeric = false;
/// <summary>
/// Ignore case in English
/// </summary>
public bool IgnoreCapital = false;
/// <summary>
/// English word
/// </summary>
public bool EnglishSegment = false;
/// <summary>
/// synonymous output
/// </summary>
/// <remarks>
/// output function is generally synonym for the word of the search string is not recommended in the index
/// </remarks>
public bool SynonymOutput = false;
/// <summary>
/// wildcard matching output
/// </summary>
/// <remarks>
/// output function is generally synonym for the word of the search string is not recommended in the index
/// </remarks>
public bool WildcardOutput = false;
/// <summary>
/// word wildcard match results
/// </summary>
public bool WildcardSegment = false;
/// <summary>
/// whether user-defined matching rules
/// </summary>
public bool CustomRule = false;
}

 

 

Parameter defines the word

[Serializable]
public class MatchParameter
{
/// <summary>
/// multi-word redundancy
/// </summary>
public int Redundancy = 0;
/// <summary>
/// unknown word weights
/// </summary>
public int UnknowRank = 1;
/// <summary>
/// word that best matches the weight
/// </summary>
public int BestRank = 5;
/// <summary>
/// times matching word weights
/// </summary>
public int SecRank = 3;
/// <summary>
/// matching word weights again
/// </summary>
public int ThirdRank = 2;
/// <summary>
/// weight force of the output word
/// </summary>
public int SingleRank = 1;
/// <summary>
Weight /// numbers
/// </summary>
public int NumericRank = 1;
/// <summary>
/// English vocabulary weights
/// </summary>
public int EnglishRank = 5;
/// <summary>
/// weight symbol
/// </summary>
public int SymbolRank = 1;
/// <summary>
/// force characters all models simultaneously output, the output weights of the original non-text characters.
/// example, the original text is simplified and traditional Chinese characters right here is the output value, and vice versa.
/// </summary>
public int SimplifiedTraditionalRank = 1;
/// <summary>
/// synonym weights
/// </summary>
public int SynonymRank = 1;
/// <summary>
Right /// wildcard matching result value
/// </summary>
public int WildcardRank = 1;
/// <summary>
When English /// filter option is in effect, the filter is greater than the length of the English.
/// </summary>
public int FilterEnglishLength = 0;
/// <summary>
When /// digital filter option is in effect, that number is greater than the length of the filter.
/// </summary>
public int FilterNumericLength = 0;
/// <summary>
/// user-defined rules fittings file name
/// </summary>
public string CustomRuleAssemblyFileName = "";
/// <summary>
/// user-defined rules of the full name of the class, that is with the name of the name space
/// </summary>
public string CustomRuleFullClassName = "";
}
 

 

Returned as a collection of WordInfo

 

 

public class WordInfo : WordAttribute, IComparable<WordInfo>
{
/// <summary>
/// current word type
/// </summary>
public WordType WordType;
/// <summary>
/// original word type
/// </summary>
public WordType OriginalWordType;
/// <summary>
/// word starting position in the text of
/// </summary>
public int Position;
/// <summary>
/// Rank for this word
/// word weights
/// </summary>
public int Rank;
/// <summary>
/// word
/// </summary>
public String Word;
/// <summary>
/// speech
/// </summary>
public POS Pos;
/// <summary>
/// word frequency
/// </summary>
public double Frequency;
}
 

 

Profile PanGu.xml

 

<?xml version="1.0" encoding="utf-8"?>
<PanGuSettings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.codeplex.com/pangusegment">
<DictionaryPath>..\Dictionaries</DictionaryPath>
<MatchOptions>
<ChineseNameIdentify>true</ChineseNameIdentify>
<FrequencyFirst>false</FrequencyFirst>
<MultiDimensionality>false</MultiDimensionality>
<FilterStopWords>true</FilterStopWords>
<IgnoreSpace>true</IgnoreSpace>
<ForceSingleWord>false</ForceSingleWord>
<TraditionalChineseEnabled>false</TraditionalChineseEnabled>
<OutputSimplifiedTraditional>false</OutputSimplifiedTraditional>
<UnknownWordIdentify>true</UnknownWordIdentify>
<FilterEnglish>false</FilterEnglish>
<FilterNumeric>false</FilterNumeric>
<IgnoreCapital>false</IgnoreCapital>
<EnglishSegment>false</EnglishSegment>
<SynonymOutput>false</SynonymOutput>
<WildcardOutput>false</WildcardOutput>
<WildcardSegment>false</WildcardSegment>
<CustomRule>false</CustomRule>
</MatchOptions>
<Parameters>
<UnknowRank>1</UnknowRank>
<BestRank>5</BestRank>
<SecRank>3</SecRank>
<ThirdRank>2</ThirdRank>
<SingleRank>1</SingleRank>
<NumericRank>1</NumericRank>
<EnglishRank>5</EnglishRank>
<EnglishLowerRank>3</EnglishLowerRank>
<EnglishStemRank>2</EnglishStemRank>
<SymbolRank>1</SymbolRank>
<SimplifiedTraditionalRank>1</SimplifiedTraditionalRank>
<SynonymRank>1</SynonymRank>
<WildcardRank>1</WildcardRank>
<FilterEnglishLength>0</FilterEnglishLength>
<FilterNumericLength>0</FilterNumericLength>
<CustomRuleAssemblyFileName>CustomRuleExample.dll</CustomRuleAssemblyFileName>
<CustomRuleFullClassName>CustomRuleExample.PickupNokia</CustomRuleFullClassName>
<Redundancy>0</Redundancy>
</Parameters>
</PanGuSettings>

 

 

DictionaryPath specify the directory where the dictionary, can be a relative path can be an absolute path.
MatchOptions the corresponding word options
Parameters for segmentation parameters

Highlight components PanGu.HighLight.dll call methods

// Create HTMLFormatter, parameters for the highlighted word before the suffix
PanGu.HighLight.SimpleHTMLFormatter simpleHTMLFormatter =
new PanGu.HighLight.SimpleHTMLFormatter("<font color=\"red\">", "</font>");
// Create Highlighter, enter HTMLFormatter and Pangu word objects Semgent
PanGu.HighLight.Highlighter highlighter =
new PanGu.HighLight.Highlighter(simpleHTMLFormatter,
new Segment());
// set the number of characters in each section summary
highlighter.FragmentSize = 50;
// Get the summary paragraph that best matches
String abstract = highlighter.GetBestFragment(keywords, news.Content);

 

Guess you like

Origin www.cnblogs.com/kelelipeng/p/11805121.html