Prompt essence decryption and Evaluation actual combat and source code analysis (1)

Chapter 9 Decryption of Prompt Essence and Evaluation Actual Combat and Source Code Analysis
9.1 Customer Service Case
This section mainly discusses the internal working mechanism of the prompt word (Prompt), centering on the three dimensions of case, source code, and paper. First of all, we can look at the code part, which is an evaluation of application development based on large models (Evaluation), which is obviously a crucial content. For all machine learning-based models, or all NLP projects, the evaluation of the application is a core thing, because the version upgrade or iteration of the program needs to evaluate the performance of the program and provide some basic data, but the evaluation of large models is different from traditional machine learning, especially based on GPT series or generative language models, because the content it generates is not exactly the same as evaluation based on content and tags in the traditional classic sense.
Gavin big coffee WeChat: NLP_Matrix_Space
OpenAI officially provided some guidance. DeepLearning.AI proposed some specific steps for evaluating the results with the help of some guiding ideas of OpenAI. Let’s take a look. In the example provided by DeepLearning.AI, its prompt words are very classic. The first is because it is effective, and the second is that many other open source frameworks and products will have similar implementations or similar prompt words.

1.	def eval_with_rubric(test_set, assistant_answer

Guess you like

Origin blog.csdn.net/duan_zhihua/article/details/131679540