Here are 14 Amazing Details Overlooked About GTP-4!

Source: Wall Street News (id: wallstreetcn)

On March 14, OpenAI released GPT-4. Another "nuclear bomb" was dropped on the technological world.

According to OpenAI's demonstration, we know that GPT-4 has more powerful power than GPT-3.5: summarizing articles, writing code, filing taxes, writing poems, etc.

But if we go deep into the technical report released by OpenAI, we may find more features about GPT-4...

b4e48ff3e14e7f7a023252997278c128.png

And some details that OpenAI did not name and promote, which may make people feel cool.


1. Install GPT-4 on new Bing

Naturally, when GPT-4 was released, the new Bing was already loaded with the latest version.

According to Jordi Ribas, vice president of Microsoft Bing, on Twitter, the new Bing loaded with GPT-4 has raised the question-and-answer limit to 15 questions at a time, with a maximum of 150 questions a day.

8e3c0bf19bf54403c5cca83136029af7.png


2. Text length expanded eight times

On GPT-4, the text length is significantly improved.

Before that, we know that the charging method of calling GPT API is based on "token". A token usually corresponds to about 4 characters, and 1 Chinese character is roughly 2~2.5 tokens.

Before GPT-4, the token limit was about 4096, equivalent to about 3072 English words. Once the length of the conversation exceeded this limit, the model would generate incoherent and meaningless content.

However, by GPT-4, the maximum number of tokens is 32768, which is roughly equivalent to 24576 words, and the text length is expanded by eight times.

b82107d71db69b6aab94447bc4c4089e.png

That said, GPT-4 can now answer longer texts.

OpenAI stated in the document that the context length limited by GPT-4 is now limited to 8192 tokens, and the version that allows 32768 tokens is called GPT-4-32K, which currently restricts access rights temporarily. In the near future, this feature may be opened.


3.  Model parameters become secrets

We know that the parameter volume of the GPT-3.5 model is 200 billion, and the parameter volume of GPT-3 is 175 billion, but this situation has been changed in GPT-4.

OpenAI stated in the report:

Given the competitive landscape and the security implications of large models such as GPT-4, this report does not include further details on architecture (including model size), hardware, training computation, dataset construction, training methods, or similar.

a5444a95c9bcd16b76e2f3af48f8c47c.png

This means that OpenAI no longer discloses the size of the GPT-4 model, the number of parameters, and the hardware used.

OpenAI said the move was due to concerns about competitors, which may be a hint of its strategy for competitors-Google Bard.

In addition, OpenAI also mentioned "the safety impact of large models", although it did not explain further, this also alluded to the more serious problems that generative artificial intelligence may face.


4.  Selectively expressed "excellent"

After the launch of GPT-4, we have all seen the excellence of this model compared to the previous generation:

GPT-4 passed the mock bar exam and scored in the top 10% of test-takers; by contrast, GPT-3.5 scored in the bottom 10%.

But this is actually a trick of OpenAI - it only shows you the best part of GPT-4, and more secrets are hidden in the report.

The figure below shows the performance of GPT-4 and GPT-3.5 in some exams. As you can see, GPT-4 does not perform so well on all tests, and GPT-3.5 does not always perform poorly.

c70f6c6f5358695c8378061dc6764c02.png


5. "Forecast" accuracy improvement

Since the launch of ChatGPT, we all know that this model will "seriously talk nonsense" in many cases, giving out many arguments that seem reasonable but do not actually exist.

Especially when predicting certain things, because the model has mastered past data, this has led to a cognitive bias called "hindsight", making the model quite confident in its own predictions.

OpenAI stated in the report that as the model size increases, the accuracy of the model should have gradually declined, but GPT-4 reversed this trend, and the figure below shows that the prediction accuracy has increased to 100.

c90a1224a09536bfd2397dc23f913d19.png

OpenAI said that although the accuracy of GPT-4 has improved significantly, prediction is still a difficult thing, and they will continue to train models on this aspect.


6. Another 30% of people prefer GPT3.5

Although GPT-4 has demonstrated much better capabilities than GPT-3.5, OpenAI's survey shows that 70% of people agree with the results of GPT-4 output:

GPT-4 shows a substantial improvement over previous models in its ability to follow user intent. In a dataset of 5214 prompts submitted to ChatGPT and OpenAI API, 70.2% of GPT-4 generated answers better than GPT3.5.

ba934f010b3f86966384d7d2ae163e00.png

This means: 30% of people still prefer GPT-3.5.


7. GPT-4 language ability is better

Although many machine learning tests are written in English, OpenAI has tested GPT-4 in many other languages.

Test results show that GPT-4 outperforms GPT-3.5 and other LLMs (Chinchilla, PaLM) in English language performance in 24 of the 26 languages ​​tested, including low-level languages ​​such as Latvian, Welsh, and Swahili. Resource language:

0d4d83d8c828d6252d08c7dc37aa1c16.png


8. New image analysis capability

Image analysis capabilities are one of the most notable advances in GPT-4 this time.

OpenAI says GPT-4 can take both text and image questions, paralleling the text-only setup, and allowing users to formulate any visual or linguistic task. Specifically, it can generate text output that users can input interspersed with text and images.

In a range of domains—including documents with text and photos, diagrams, or screenshots—GPT-4 demonstrated similar capabilities to plain text input.

The image below shows that GPT-4 can accurately describe the antics in the picture (a large VGA connector plugged into a small modern smartphone charging port, a man ironing clothes while standing in the back of a taxi).

9281ff612164b161353c2f9ea8fd9884.png

d15e65b5949fc77058643d06208d253d.png

OpenAI also tested the image analysis capabilities of GPT-4 on academic standards:

4d58d6fb191c4edd6917f9c07a7975a9.png

However, the image analysis function of GPT-4 has not been released to the public, and users can join the waiting queue through the bemyeye website.


9. There are still bugs

Although powerful, GPT-4 has similar limitations to earlier GPT models.

OpenAI says GPT-4 is still not entirely reliable — it can “hallucinate” facts and make inference errors :

When using language model output, especially in high-stakes contexts, great care should be taken, using the exact protocol that matches the needs of a particular application (e.g. manual inspection, additional context, or avoiding high-stakes usage altogether).

Compared with the previous GPT-3.5 model, GPT-4 significantly reduces "hallucination" (the GPT-3.5 model itself has also been improved in continuous iterations). In our internal, factual evaluation of adversarial designs, GPT-4 outperforms our state-of-the-art GPT-3.5 by 19 percentage points.

83f77b9bac1f3badca1467acf8f0b8c0.png


10. The database is older

After introducing the advantages of GPT-4, there are some (possibly strange) disadvantages.

We all know that the ChatGPT database was last updated on December 31, 2021, which means that what happened after 2022 will not be known, and this defect has also been fixed in the subsequent GPT-3.5.

But strangely, in the GPT-4 report, OpenAI clearly wrote:

GPT-4 generally lacks knowledge of what happened after the September 2021 outage for the vast majority of its pre-training data, and does not learn from its experience. It sometimes makes simple reasoning errors that seem out of line with competence in many domains, or is too gullible, accepting obvious misrepresentations from users. It can fail at tricky problems like humans, like introducing security holes in the code it generates.

0476388bf408e8e01487d9ca21111119.png

September 2021...even before GPT-3.

In the latest ChatGPT loaded with GPT-4, when we asked "Who is the 2022 World Cup champion", ChatGPT still knows nothing:

6d474725d47268ae87df6d65cc0a9fad.png

But with the help of the new Bing search function, it becomes "smart" again:

ac05de112bfaf1826217b9bf920a74e6.png


11. May give criminal advice

In the report, OpenAI mentioned that GPT-4 may still help crimes - this is a problem that existed in previous versions, and although OpenAI has worked hard to adjust, it still exists :

As with previous GPT models, we use reinforcement learning and human feedback (RLHF) to fine-tune the behavior of the model to produce responses that better match user intent.

However, after RLHF, our model remains vulnerable on unsafe inputs, sometimes exhibiting undesirable behavior on both safe and unsafe inputs.

These undesired behaviors arise when the instruction to the tagger is unspecified in the reward model data collection part of the RLHF pathway. When given unsafe inputs, models may generate undesirable content, such as suggesting crimes.

In addition, models can also be overly cautious about safe inputs, deny harmless requests, or be over-hedged.

To steer our model toward appropriate behavior at a finer-grained level, we rely heavily on our model itself as a tool. Our safety approach consists of two main components, an additional set of safety-related RLHF training cues, and rule-based reward models (RBRMs).

4f3490660a7824e341e086de0da17a04.png


12.  Spam

Likewise, GPT-4 has the potential to be quite “useful” in disseminating harmful information due to its ability to “say the wrong thing plausibly”:

GPT-4 can generate realistic and targeted content, including news articles, tweets, conversations, and emails.

In Harmful Content, we discuss how similar capabilities can be abused to exploit individuals. Here, we discuss common concerns about disinformation and influence operations. Based on our overall capability assessment, we expect GPT-4 to outperform GPT-3 in generating realistic, targeted content.

However, there is still a risk that GPT-4 will be used to generate content intended to be misleading .

bb4b7d2a129e08368ba652cce3c95af2.png


13. Seek Power

From this point on, what follows can be a bit scary.

In the report, OpenAI mentions GPT-4's tendency to "seek power" and warns of the risks of this feature:

New abilities often appear in more powerful models. Some abilities of particular interest are the ability to create long-term plans and act on them, to accumulate power and resources ("seeking power"), and to exhibit increasingly "aggressive" behavior.

"Agent" here does not refer to the humanization of language models, nor to IQ, but to systems characterized by abilities, e.g., to accomplish goals that may not be specified and did not appear in training; focus on achieving specific , quantifiable goals; and long-term planning.

There is already some evidence of such emergent behavior in models.

For most possible goals, the best plans involve auxiliary power-seeking, as this is inherently helpful in advancing the goal and avoiding changes or threats to the goal.

More specifically, power seeking is optimal for most reward functions and for many types of agents; and there is evidence that existing models can identify power seeking as an instrumentally useful strategy.

Therefore, we were particularly interested in assessing power-seeking behavior because of its potentially high risk profile.

bcdd06d2783fdc6db751005224f4bd0d.png

Even creepier, in another paper mentioned by Openai:

Instead, we use the term agent to emphasize the increasingly obvious fact that machine learning systems are not fully under human control.

8b3d8632d095ea097b9081d1f22c01da.png


14. Giving GPT-4 money, code and dreams

One last little detail.

In the process of testing GPT-4, the external expert team ARC introduced by OpenAI serves as the "red party". In a note to the report, OpenAI mentioned an operation of ARC:

To simulate GPT-4's behavior like an agent that can act in the real world, ARC combines GPT-4 with a simple read-execute-print loop that allows the model to execute code, perform chained reasoning, and delegate Give yourself a copy.

ARC then advanced whether running a version of this program on a cloud computing service, with a small amount of money and an account with a language model API, could make more money, build its own copy, and add its own robustness.

32d2acb5c9088c40f0cc5df57114fa0c.png

In other words, ARC has given GPT-4 the ability to self-code, replicate and execute, and even start-up funds-GPT-4 can already start to make money on its own.

END

Welcome to join Imagination GPU and artificial intelligence communication group 2

bd23794d95dca96dcdba417b9c154be5.jpeg

Join the group, please add the editor WeChat: eetrend89

(Please add company name and title)

recommended reading

Dialogue with the Chairman of Imagination China: Using GPU as the fulcrum to strengthen software and hardware collaboration to facilitate digital transformation

Cooperation case | Imagination's car-level hardware virtualization helps Telechips improve the diversity of displays

578f2dfa1741d6211781622732728c64.png

Imagination Technologies  is a UK-based company dedicated to the research and development of chips and software intellectual property (IP). Products based on Imagination IP are used in the phones, cars, homes and workplaces of billions of people around the world. For more information on cutting-edge technologies such as the Internet of Things, smart wearables, communications, automotive electronics, and graphics and image development, welcome to Imagination Tech!

Guess you like

Origin blog.csdn.net/weixin_49393016/article/details/129700975