Attendance Record | Beijing Zhiyuan Conference - Basic Model Frontier Technology Forum

On the morning of June 10, 2023, I participated in the Beijing Zhiyuan Conference online, and mainly watched the 2023 Beijing Zhiyuan Conference-Basic Model Frontier Technology Forum . The following are the points of interest and thoughts in the forum.

The speakers of this forum are as follows:
insert image description here

The above-mentioned guests included Mr. Liu Yinhan and Mr. Zhou Yanqi from the industry, Mr. Liu Zhiyuan, Mr. Liu Pengfei and Mr. Liu Jing from the academic circle, and Mr. Lin Yonghua from the research institute. Among these guests, the one I am most familiar with is Mr. Liu Pengfei , the founder of Prompt Learning ~ I heard that he has now entered the job and handed over to carry out follow-up research work.

You can visit the 2023 Beijing Zhiyuan Conference-Basic Model Frontier Technology Forum to review the contents of the teachers' reports . Here I will focus on the research points that I am more interested in in the content of the roundtable discussion. PS: In the process of watching the video, we can enjoy the real-time speech recognition and translation brought by Baidu AI simultaneous interpretation technology, which greatly improves the perception~

Teacher Liu Pengfei mentioned that pretrain model & prompt learning can be regarded as a process of "deposit & withdraw", the key point of which is the unequal information difference. This angle is very novel, I have not thought of it before. What he is currently interested in is AI for Mathematics, which is to use AI to solve math problems. In addition, LLM's ability to understand structured data such as json and html data is also mentioned.

From the perspective of academic research, Ms. Liu Jing told us about the era of large models. With the continuous emergence of large models of perception and decision-making, one of the "feasible paths" for scientific research is to form a "complementary" with enterprises—— Although we have no way of knowing the secret recipe of the company, we can increase the speed of product landing through cooperation and better serve the public. The second is to do exploratory research, such as AI for science. In addition, Teacher Liu also mentioned that prompt engineers may be a profession that will disappear in the next few years, given that soft prompt learning has been developed in full swing.

Teacher Liu Yinhan, the proposer of RoBERTa , BART , and mBART , will report this time as the core founder and CTO of Brich.ai . It's a pleasure to meet you in this forum! She mentioned that the two key points of LLM generation quality are RLHF and reward model. In addition, artificial general intelligence (AGI) is not necessary in the professional field, because enterprises have their own models, and existing Large models have security risks and are considered for privacy protection, so large models will not become the first choice of enterprises. For their company, they are exposed to the type of healthcare-care data. During the training process, for massive data, they will use the sliding window method to improve training efficiency. Not sure how this relates to parallelism as I understand it.

Teacher Zhou Yanqi, Google research scientist, co-proposer of the T5 model, her main research is MoE, full name Mixture-of-Experts, that is, mixed expert system, the latest paper is Mixture-of-Experts with Expert Choice Routing (NIPS 2022) . I have never heard of this concept before. After a simple understanding, I found that there have been many related researches, such as the visual model V-MoE , the language model Switch Transformers , and the multimodal model LIMoE . Personally, I feel that MoE has something to do with integrated learning. What they have in common is the decision-making set of multiple decision-makers. In addition, Mr. Zhou mentioned whether the auto-regression used in the LLM reasoning process can be in parallel is a direction worth exploring. However, I think that auto-regression is more in line with the human habit of expressing language. After all, language is temporal information. Another possibility is that we have already completed the parallelization of language in the brain, but it needs to be expressed sequentially in the temporal space , this is the category that cognitive science needs to study~

In addition, during the discussion, the teachers also mentioned the hallucination problem generated by AI. There are also related researches on this problem, and this problem is also worth exploring.

Attached are some technical terms learned in this forum:

SFT: Supervised Fine-Tuning with supervised fine-tuning, the corresponding training data is SFT data
ROI: Return On Investment


References

  1. MoE in Large Model - 知乎 (zhihu.com)
  2. Brief description of sparse large model: from MoE, Sparse Attention to GLaM_Xi Xiaoyao's blog-CSDN blog
  3. Scaling Laws for Neural Language Models - 知乎 (zhihu.com)
  4. Detailed explanation of ChatGPT principle + practical operation (1)----SFT (GPT model fine-tuning) - Zhihu (zhihu.com)
  5. Countermeasures related to "hallucination" (illusion) of GPT-4 - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/qq_36332660/article/details/131165970
Recommended