ChatGPT has a major upgrade: you can view pictures, listen to sounds, and speak!

On September 25th, Eastern Time, OpenAI announced on its official website that it has made a major upgrade to ChatGPT to realize the three major functions of viewing pictures, listening to sounds, and outputting voice content.

As early as March this year, when OpenAI released the GPT-4 model, it demonstrated the function of viewing images, but it has not been open due to reasons such as security and incomplete functions. Now it is not only open to view pictures, but also to recognize sounds. This is an important technical link in OpenAI's strategy to realize AGI (artificial general intelligence).

OpenAI said that within the next two weeks, it will provide viewing, listening and speaking functions to Plus and Enterprise users. The voice function will be available on iOS and Android, and image recognition can be used on all platforms.

Insert image description here

Communicate with ChatGPT using voice

ChatGPT’s new speech feature is powered by a text-to-speech model capable of generating human-like audio from just text and a few seconds of sample speech.

OpenAI worked with professional voice actors to create 5 synthetic voices, and also used Whisper, a self-developed open source speech recognition system, to transcribe the user's voice into text.

To put it simply, if users want to directly generate speech from text in the future, they can do so in ChatGPT.

Text can be directly generated into speech, which can be done in ChatGPT.

For example, let ChatGPT listen to a text story about a kitten, and then select the human voice to complete the transcription with one click. Once completed, users can download the audio clip.

Insert image description here

You can ask ChatGPT about pictures.
Users can show one or more pictures to ChatGPT and ask related questions. For example, send a picture of a broken barbecue grill and ask why it can't be started; take a picture of the ingredients in the refrigerator and ask for various cooking options.

You can use the drawing function on the mobile terminal to frame it and ask questions.

If the user only wants to ask about part of the content in the picture, they can use the drawing function on the mobile terminal to frame it and ask the question.
ChatGPT's image understanding function is technically supported by GPT-3.5 and GPT-4. The image types that can be understood include photos, screenshots, or images containing text.
Providing secure AI services
OpenAI said its goal is to build AGI (Artificial General Intelligence) that is both safe and beneficial. Therefore, the functions of ChatGPT are being gradually rolled out. The advantage of this is that it allows OpenAI time to make improvements and gradually improve security vulnerabilities and risks.
In particular, new speech technologies can generate realistic synthetic voices in seconds, which may provide convenience for scammers, so this safe R&D strategy is very important for advanced models involving speech and vision.
Currently, Spotify has used ChatGPT's voice function to develop a voice translation assistant that can automatically translate bloggers' voices into other languages ​​to expand its user base. Be My Eyes integrates ChatGPT's image viewing function into the application to provide services for blind and partially sighted groups.
The material of this article comes from the official website of OpenAI. If there is any infringement, please contact us to delete it.

Guess you like

Origin blog.csdn.net/weixin_57291105/article/details/133313733