SAS ECC12版本问题记录

SAS ECC12版本问题记录:
1.移除语句没有算nobreak的部分;
2.节点名跟申明的concept字符串相同时候引用有问题;
3.忽略大小写的时候有点问题(classifier还是concept?有点忘了)
4. 导入Excel测试时候,单元格内容过长导致溢出问题,误以为没匹配上;
5.软件安装路径不能有中文,R也一样。
......
其它问题忘得差不多了,这种收费的“人工特征”处理文本分类估计以后用不到了。

附:

SAS Content Categorization  is designed to develop and deploy categorization and extraction rules to classify unstructured content. Industry-specific taxonomies can be added to quick-start taxonomy development. With improved graphical reports for precision and recall, rule definition and refinement is further simplified by using new co-reference operators for pronoun resolution. Initial categories and subcategories can also be generated from Wikipedia.
Based on a managed, collaborative taxonomy definition, linguistic concept rules and context-sensitive term and phrase extraction are defined in the categorization-model development environment. The product also includes a document conversion utility and a deployment environment.
SAS Enterprise Content Categorization can be further extended to create a unique environment from a variety of add-on modules. The modules enable linguistic based document summarization, document duplication detection, search and indexing, file- or web-crawling, content alerts, and an editorial workbench.
For organizations with single-user needs for categorization only (that is, no context-sensitive extraction), an alternate, content-categorization product can be used with all of the same add-on modules.
This product includes English and, if appropriate, the native language associated with the installed site. Other languages are available as add-ons by request.
The most recent release is SAS Content Categorization 12.1.

猜你喜欢

转载自blog.csdn.net/u013303361/article/details/79894149
SAS
今日推荐