An article to let you thoroughly understand OpenAI: CSDN exclusive comprehensive analysis

Table of contents

What is OpenAI

The development history of OpenAI

Explanation of related terms

API​

GPT​

GPT-2​

GPT-3​

GPT-3.5​

GPT-4​

ChatGPT​

ChatGPT Plus​

Transformer model​

Codex​

Whisper​

MuseNet​

Microscope​

DALL-E & CLIP​


What is OpenAI

OpenAI is an American artificial intelligence (AI) research laboratory comprised of the nonprofit OpenAI Incorporated (OpenAI Inc.) and its for-profit subsidiary, OpenAI Limited Partnership (OpenAI LP). OpenAI conducts AI research with the aim of promoting and developing friendly AI. The OpenAI system runs on the fifth most powerful supercomputer in the world. The organization was founded in San Francisco in 2015 by Sam Altman, Reid Hoffman, Jessica Livingston, Elon Musk, Ilya Sutskever, Peter • Peter Thiel and others pledged $1 billion. Musk resigned from the board in 2018 but remains a donor. Microsoft provided OpenAI LP with a $1 billion investment in 2019 and a second multi-year investment in January 2023, reportedly at $10 billion.

The development history of OpenAI

  • 2015.12 - Sam Altman, Greg Brockman, Reid Hoffman, Jessica Livingston, Peter Thiel, Elon Musk, Amazon Web Services (AWS), Infosys and YC Research announced the formation of OpenAI and committed to invest more than $1 billion in the venture. The group said, Will "collaborate freely" with other institutions and researchers by making its patents and research available to the public.
  • 2016.04 - OpenAI released the public beta of its reinforcement learning research platform "OpenAI Gym".
  • 2016.12 - OpenAI released "Universe," a software platform for measuring and training AI-powered general intelligence in games, websites, and other applications around the world.
  • 2018 - Musk resigned from his board seat due to Tesla's development of AI for self-driving cars, citing a "possible future conflict of interest" with his role as Tesla's CEO, but remains a donor.
  • 2019 - OpenAI shifts from non-profit to 'capped-profit' to attract capital with a profit cap of 100 times any investment ( OpenAI shifts from nonprofit to 'capped-profit' to attract capital ). The capped profit model allows OpenAI LP to legally attract investment from venture funds and, in addition, grant employees shares in the company.
  • 2020 - OpenAI releases GPT-3, a language model trained on large internet datasets. GPT-3 is designed to answer questions in natural language, but it can also translate between languages ​​and coherently generate ad-hoc text. It also announced an associated API, simply called "API," that will form the core of its first commercial product.
  • 2021 - OpenAI introduced DALL-E, a deep learning model that can generate digital images from natural language descriptions.
  • 2022.12 - OpenAI received a lot of media coverage after launching a free preview of ChatGPT, a new AI chatbot based on GPT-3.5. According to OpenAI, the preview version received more than one million registrations in its first five days. 100 million users in just two months after launch (fastest growing app ever).
  • 2023.01 - OpenAI is in talks for financing that would value the company at $29 billion, double the company's 2021 market capitalization. On January 23, 2023, Microsoft announced a new multi-year, multi-billion dollar investment plan (reportedly $10 billion) for OpenAI. The investment is believed to be part of Microsoft's integration of OpenAI's ChatGPT into the Bing search engine. After the launch of ChatGPT, Google announced a similar AI application (Bard), fearing that ChatGPT would threaten Google's position as the preferred source of information.
  • 2023.02.07 - Microsoft announced that it is building AI technology based on the same foundation as ChatGPT into Microsoft Bing, Edge, Microsoft 365 and other products.
  • 2023.02.15 - The domain name AI.com redirects to the ChatGPT website, and it is said that the domain name was acquired by 11 million US dollars in 2021.09 ( AI.com Now Forwarding to ChatGPT Website ).
  • 2023.02.28 - Microsoft announced a major update to Windows 11 with a host of features that take advantage of the power of AI and improve the way people get work done on their PCs. The Bing team is excited to share: As part of this update, we're bringing New Bing directly to the Windows taskbar, unlocking even more ways to interact with your PC, including search, answer, chat, and create (Introducing the new Bing in Windows 11 ).

Explanation of related terms

API

In 2020.06, OpenAI announced a multi-purpose API, which is said to be "used to access new AI models developed by OpenAI", allowing developers to call it to perform "any English-language AI task".

GPT

Number of parameters: 120 million, Training data: BookCorpus is a corpus of 7000 unpublished books with a total size of 4.5 GB. These books cover a variety of different literary genres and topics.

Transformer-based generative pre-training model (English: Generative pre-trained transformers, GPT for short) is a natural language generation model that extends from the transformer architecture (Transformer). It can be fine-tuned to complete various natural language processing tasks, such as text generation, code generation, video generation, text question answering, image generation, paper writing, film and television creation, scientific experiment design, etc. Trained on a large amount of corpus data to generate text similar to human natural language. The "pre-training" in its name refers to the initial training process on a large text corpus, where the model learns to predict the next word in a passage, which provides the model with the opportunity to perform well on downstream tasks with limited task-specific data. solid foundation.

The original paper on GPT ( Improving Language Understanding by Generative Pre-Training ) was written by Alec Radford and colleagues, and was published on OpenAI's website as a preprint on 2018.06.11. It shows how generative models of language can acquire world knowledge and handle long-range dependencies by pre-training on diverse corpora with long stretches of continuous text.

GPT-2

Number of parameters: 1.5 billion, Training data WebText: A corpus of eight million documents with a total size of 40 GB. The text was collected from the 45 million top-voted webpages on Reddit, covering a variety of topics and sources such as news, forums, blogs, Wikipedia, and social media, among others.

Generative Pre-trained Transformer 2 (English: Generative Pre-trained Transformer 2, GPT-2 for short) is an open source artificial intelligence created by OpenAI in 2019.02. GPT-2 is able to translate text, answer questions, summarize passages, and generate text output. While its output is sometimes human-like, it can become repetitive or meaningless when generating long paragraphs. GPT-2 is a general-purpose learner, not specifically trained to perform any specific task, and was created as a "direct extension" of the OpenAI 2018 GPT model, with a tenfold increase in both the number of parameters and the size of the training dataset .

Some experts have expressed doubts that GPT-2 poses a significant threat. The Allen Institute for Artificial Intelligence responded to GPT-2 ( Could 'fake text' be the next global political threat? ) with a tool to detect "fake news". Other researchers, such as Jeremy Howard, warn of "techniques that completely fill tweets, emails, and the web with plausible-sounding, context-appropriate prose that drowns out all other speech and becomes impossible to filter". In November 2019, OpenAI released the full version of the GPT-2 language model.

GPT-3

Number of parameters: 175 billion, Training data: A large-scale text corpus with a total size of 570 GB, containing about 400 billion tokens. These data mainly come from CommonCrawl, WebText, English Wikipedia and two book corpora Books1 and Books2.

Generative Pre-trained Transformer 3 (English: Generative Pre-trained Transformer 3, referred to as: GPT-3) is an autoregressive language model, the purpose of which is to use deep learning to generate natural language that humans can understand. GPT-3 is trained and developed by OpenAI, and the model design is based on the Transformer model developed by Google. The neural network of GPT-3 contains 175 billion parameters and requires 800GB to store, which is the neural network model with the most parameters ever. The model demonstrates strong zero-shot and few-shot capabilities on many tasks.

OpenAI published the GPT-3 paper ( Language models are few-shot learners ) in 2020.05, and released a beta version of the application programming interface for a small number of companies and developer groups in the following month. Microsoft announced on 2020.09.22 that it has obtained the exclusive license of GPT-3.

GPT-3.5

2022.03.15, OpenAI provides new versions of GPT-3 and Codex with editing and inserting functions in its API, named "text-davinci-003" and "code-davinci-002". The models are described as more powerful than previous versions and were trained on data up to 2021.06. 2022.11.30, OpenAI began to refer to these models as the "GPT-3.5" series, and released ChatGPT, which was fine-tuned from a model in the GPT-3.5 series.

GPT-4

Generative Pre-trained Transformer 4 (GPT-4) is an unpublished neural network created by OpenAI. According to the New York Times, it is "rumored to be available in 2023"; Vox  claims that other sites have said that it is rumored to be better than OpenAI's previously released GPT-3 and GPT-3.5. The Verge  also cites rumors that it will drastically increase the number of parameters of GPT-3 (from 175 billion to 100 trillion), which OpenAI CEO Sam Altman described as "complete nonsense."

ChatGPT

Chat Generative Pre-trained Transformer (English: Chat Generative Pre-trained Transformer, Abbreviation: ChatGPT), an artificial intelligence chat robot program developed by OpenAI, was launched in 2022.11. The program uses a large language model based on the GPT-3.5 architecture and is trained with reinforcement learning. ChatGPT still interacts in the form of text, and in addition to interacting through natural human dialogue, it can also be used for relatively complex language work, including automatic text generation, automatic question and answer, automatic summarization, etc. A variety of tasks. For example, in terms of automatic text generation, ChatGPT can automatically generate similar texts (scripts, songs, plans, etc.) based on the input text, and in terms of automatic question and answer, ChatGPT can automatically generate answers based on the input questions. Also has the ability to write and debug computer programs. During the promotion period, everyone can register for free and use ChatGPT to chat with AI robots for free after logging in.

ChatGPT can write articles similar to real people, and quickly gained attention because of its detailed answers and clear answers in many areas of knowledge, proving that it is also competent for knowledge-based work that was previously thought not to be replaced by AI. The impact on financial and white-collar labor markets has been considerable, but its variable factual accuracy is considered a major flaw, and its ideologically based model training results are considered to need careful correction. After ChatGPT was released in 2022.11, OpenAI's valuation has risen to $29 billion. Two months after launch, the number of users reached 100 million.

ChatGPT Plus

ChatGPT Plus is a $20 per month subscription service that allows users to access ChatGPT during peak hours, provides faster response times, and gives users early access to new features.

Transformer model

The Transformer model (literally translated as "transformer") is a deep learning model that uses a self-attention mechanism that assigns different weights according to the importance of each part of the input data. This model is mainly used in the fields of natural language processing (NLP) and computer vision (CV).

Like recurrent neural networks (RNNs), Transformer models are designed to process sequential input data such as natural language, and can be applied to tasks such as translation and text summarization. Unlike RNNs, Transformer models can process all input data at once. The attention mechanism can provide context for any position in the input sequence. If the input data is natural language, Transformer does not have to process only one word at a time like RNN. This architecture allows more parallel computing and thus reduces training time.

The Transformer model was launched by a team at Google Brain in 2017, and has gradually replaced RNN models such as Long Short-Term Memory (LSTM) as the model of choice for NLP problems. The parallelization advantage allows it to be trained on larger datasets. This also contributed to the development of pre-training models such as BERT and GPT. These systems are trained using large corpora such as Wikipedia, Common Crawl, etc., and can be fine-tuned for specific tasks.

Codex

Codex, announced in mid-2021, is a descendant of GPT-3, which was also trained on code from 54 million GitHub repositories and is the AI ​​that powers code auto-completion tool GitHub   Copilot . 2021.08, an API was released as a private beta. According to OpenAI, the model was able to create working code using more than a dozen programming languages, most effectively Python.

Whisper

Released in 2022, Whisper is a general-purpose speech recognition model. It is trained on a large dataset of different audios and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language recognition.

MuseNet

MuseNet is a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country music to Mozart to the Beatles. Rather than being explicitly programmed with our understanding of music, MuseNet discovers patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. MuseNet uses the same general unsupervised technique as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text.

Microscope

OpenAI Microscopy, a collection of visualizations of every significant layer and neuron of eight visual "model organisms," often studied in terms of interpretability. Microscopy makes it easier to analyze the features that form inside these neural networks, and we hope it will help the research community in our journey to understand these complex systems.

DALL-E & CLIP

Released in 2021, DALL-E is a Transformer model that creates images from textual descriptions.

CLIP, also released in 2021, does the opposite: it creates a description for a given image. DALL-E uses a 12 billion parameter version of GPT-3 to interpret natural language input (a green leather wallet shaped like a pentagon) and generate corresponding images. It can create images of real objects (a stained glass window with an image of blue strawberries) as well as objects that don't exist in reality (a cube with a porcupine texture).

Guess you like

Origin blog.csdn.net/weixin_41259045/article/details/130077110