[Chinese Arena] In-depth experience and evaluation of large models

Introduction : This time, I deeply experienced the large language model in the Chinese arena, and tried three fields: writing creation, code writing and Chinese games. The following is my detailed evaluation report.

image.png

1. Opening

Today, with the rapid development of science and technology, the Chinese Arena provides a series of large models for us to experience. The fields involved include writing creation, code writing, Chinese games, etc. It is really a "knowledge kingdom" covering a wide range. Next, I will share my experience and experience in these three fields in detail.

image.png


2. Experience assessment of writing and creative ability

1. Question Context The Writing Creation Test is designed to measure a model's creativity, coherence, and understanding of a given topic.

2. Test models The models for this evaluation are: Model A: billa-7b-sft-v1 and Model B: moss-moon-003-sft-v1.

3. Questions and Answers I asked the model to write an essay titled "The Future of Artificial Intelligence". The results show that Model A can provide more in-depth and forward-looking content, while Model B is more focused on the current state of development . Both performed fairly well in terms of coherence and logic, but Model A was more insightful when it came to looking to the future.

image.png


3. Code writing ability experience evaluation

1. Problem Background The purpose of the code writing test is to verify the model's understanding of the programming language and its ability to generate code.

2. Test content I provided a simple programming requirement, requiring the model to generate corresponding Python code snippets.

3. Experience conclusion Both models can quickly generate code, but in terms of details and optimization, Model B performs better, with clear code structure and strong readability, while Model A completes the task, but the code is slightly redundant .

image.png


4. Chinese game experience evaluation

1. Question background The goal of the Chinese game test is to understand the performance of the model on Chinese quiz and text adventure games.

2. Questions and Answers First, I asked a Chinese idiom question for the model: What is the meaning of "referring to a deer as a horse"? Model A gave a relatively concise answer, while Model B gave a more complete answer and proposed specific historical events.

3. Experience conclusion In terms of Chinese games, Model A is obviously better, but this also reminds me that no matter how advanced the model is, we cannot completely rely on it, after all, the machine also has its limitations.

image.png


V. Conclusion

After in-depth experience and evaluation, I found that the large models in the Chinese arena have excellent performance in various fields, but there is still room for improvement. For us, such a platform can not only help us quickly acquire knowledge, but also exercise our critical thinking and truly "dance with the machine".

Guess you like

Origin blog.csdn.net/m0_63722685/article/details/132347699