Rasa Course, Rasa Training, Rasa Interview, Transliteration of Rasa Practical Series

Rasa Course, Rasa Training, Rasa Interview, Transliteration of Rasa Practical Series

Transliteration

insert image description here

insert image description here

Reference link: https://forum.rasa.com/t/phonetics-featurizer/42132/4

Hey @koaning, first let me quickly say I'm using Greek. Regarding the CountVerctorsFeaturizer, we had some problems using ngrams. Mainly we get misclassification because one word is a substring of another word or they have the same origin. Let's say "εισερχομενες" and "εξερχομενες", which means incoming and outgoing. This happens a lot in Greek. I guess an example in English might be "classify" and "publish". There is a "new" language in Greek called Greek. This means people write Greek but use English characters because they are tired of switching keyboards (mostly young people). So one would write "εισερχομενες" and another would write "eiserxomenes". It's like people are actually doing some kind of speech processing while they're writing. This does not work for simple ngrams. I thought of two ways to solve this problem:

One is to create a custom preprocessor (already have one) and change the message to a speech-based message, then run ​​message.set(text) to change the message for the next component in the pipeline. This will also be handled during training

Guess you like

Origin blog.csdn.net/duan_zhihua/article/details/123932239