Organizing | Su Mi
Listing | CSDN (ID: CSDNnews)
Microsoft, which has joined hands with OpenAI and Meta to promote the development of large models, is also accelerating the iteration of its own small models. Just today, Microsoft officially released a 2.7 billion parameter language model-Phi-2. It is a text-to-text artificial intelligence program with excellent reasoning and language understanding capabilities.
At the same time, Microsoft Research also said on the official X platform, "Phi-2's performance is better than other existing small language models, but it is small enough Run on your laptop or mobile device".
Can Phi-2 really outperform a model 25 times larger?
Regarding the release of Phi-2, Microsoft Research bluntly stated at the beginning of the official announcement that the performance of Phi-2 can match or exceed that of models that are 25 times larger.
This is also a bit embarrassing. Many netizens commented, doesn’t this easily surpass the smallest version of Gemini just released by Google?
So what is the specific situation?
Microsoft has adopted some of the current methods such as Big Bench Hard (BBH), common sense reasoning (PIQA, WinoGrande, ARC easy and Challenge, SIQA), language understanding (HellaSwag, OpenBookQA, MMLU (5-shot), SQuADv2, BoolQ), mathematics (GSM8k) and encoding (HumanEval), comparing Phi-2 with Mistral and Llama-2 with 7B and 13B parameters.
The result is Phi-2, which has only 2.7 billion parameters, surpassing the performance of the Mistral 7B and Llama-2 7B and 13B models. Notably, Phi-2 also achieves better performance on multi-step inference tasks (i.e., coding and mathematics) compared to the Llama-2-70B model, which is 25 times larger.
In addition, as mentioned above, Microsoft researchers also directly put the results of its front-side PK with Google's newly released Gemini Nano 2 in the benchmark test. As expected, Phi-2's performance is still the same despite its smaller size. It surpasses the Gemini Nano 2.
In addition to these benchmarks, the researchers seemed to be insinuating that Google had fabricated Gemini demonstration videos a few days ago because at that time Google says its upcoming largest and most powerful new artificial intelligence model, Gemini Ultra, can solve fairly complex physics problems and even correct students' mistakes.
It turns out that even though the Phi-2 may be a fraction of the size of the Gemini Ultra, it's also capable of answering questions correctly and correcting students using the same prompts.
Microsoft improvements
Microsoft Research explained the reasons why the Phi-2 small model has such outstanding results in a blog.
The first is to improve the quality of training data. Phi-2 is a Transformer-based model with the goal of predicting the next word, which is trained on 1.4T phrases from NLP and encoded synthetic datasets and web datasets, including science, daily activities and Theory of mind, etc. are used to teach model common sense and reasoning. Training on Phi-2 took 14 days on 96 A100 GPUs.
Second, Microsoft expanded using innovative technology to embed its knowledge into the 2.7 billion parameter Phi-2.
Microsoft notes that Phi-2 is a base model that has not been tuned through reinforcement learning with human feedback (RLHF) or guided fine-tuning. Nonetheless, Microsoft observed better performance of Phi-2 in terms of toxicity and bias compared to existing aligned open source models.
write at the end
It is said that the release of Phi-2 has indeed achieved a breakthrough in the performance of small models, but some media have discovered that it still has great limitations.
Because according to the Microsoft Research License, it stipulates that Phi-2 can only be used for "non-commercial, non-revenue-generating, research purposes" and not for commercial purposes. As a result, businesses that want to build products on top of it are out of luck.
Source: https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/