Open source free multilingual translation model

 

Today I will introduce to you the open source and free multilingual translation models of the University of Helsinki. The University of Helsinki has developed more than 1,400 multilingual translation models. We can download and use these models for free on the Hugging Face website . Today I will introduce the Chinese to English and English translations. Two models in translation.

The environment of my machine is win11, adaconda, and python10. It is best to create a dedicated virtual environment in the adaconda environment. This will not cause dependency conflicts. We can install the following packages in the virtual environment:

  • pip install transformers[sentencepiece]
  • pip install torch
  • pip install sacremoses (optional)

1. Model download

We need to go to the Hugging Face website to download the language model and the required files. The following are the files required for the two models of Chinese to English and English to Chinese , and store them in two designated local folders:

 We only need to download these 7 files for both models.

 

 

1. Chinese to English translation

Next, we implement the Chinese-to-English translation function by loading the local model:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import pipeline

model_path = './zh-en/'  
#创建tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path) 
#创建模型 
model = AutoModelForSeq2SeqLM.from_pretrained(model_path) 
#创建pipeline
pipeline = pipeline("translation", model=model, tokenizer=tokenizer)

 Let's implement the translation function:

chinese = """
六岁时,我家在荷兰的莱斯韦克,房子的前面有一片荒地,
我称其为“那地方”,一个神秘的所在,那里深深的草木如今只到我的腰际,
当年却像是一片丛林,即便现在我还记得:“那地方”危机四伏,
洒满了我的恐惧和幻想。
"""
result = pipeline(chinese)
print(result[0]['translation_text'])

 

chinese="""
谷歌于2019年推出了 53 量子位的 Sycamore 处理器,
而本次实验进一步升级了 Sycamore 处理器,已提升达到 70 个量子位。
谷歌表示升级 Sycamore 处理器之后,虽然受到相干时间等其它因素的影响,
其性能是此前版本的 2.41 亿倍。
在实验中,科学家们执行了随机电路采样任务。在量子计算中,
这涉及通过运行随机电路和分析结果输出来测试量子计算机的性能,
以评估其在解决复杂问题方面的能力和效率。
"""

result = pipeline(chinese)
print(result[0]['translation_text'])

 

 2. English to Chinese

Next, let's realize the function of English translation:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import pipeline

model_path = './en-zh/' 
english="""
China has expanded its share of global commercial services exports from 3 percent \
in 2005 to 5.4 percent in 2022, according to a report jointly released by \
the World Bank Group and World Trade Organization earlier this week.
"""

tokenizer = AutoTokenizer.from_pretrained(model_path)  
model = AutoModelForSeq2SeqLM.from_pretrained(model_path) 
pipeline= pipeline("translation", model=model, tokenizer=tokenizer)

finaltext = pipeline(english)
print(finaltext[0]['translation_text'])

 

%%time
english="Which TV can I buy if I'm on a budget?"
finaltext = pp(english)
print(finaltext[0]['translation_text'])

 

%%time
english="""
The European Union and Japan will increase cooperation around key \
technologies, including artificial intelligence and computer chip \
production, the 27-member bloc's commissioner for the internal market \
has said.
"""
finaltext = pp(english)
print(finaltext[0]['translation_text'])

 

 You can try it out and see how well the two models translate.

References

Chinese to English model

English-to-Chinese model

 

Guess you like

Origin blog.csdn.net/weixin_42608414/article/details/131575118