The latest multimodal model MiniGPT-4 is open source | Feel the image recognition ability of GPT-4 in advance | LLM based on Vicuna | Can generate image descriptions | Build a website based on handwritten text instructions

overview

The latest multimodal model MiniGPT-4 is open source: it uses the advanced large-scale language model (LLM)-Vicuna (where Vicuna is built based on LLaMA) for tuning, and can reach 90% of ChatGPT's ability in terms of text language. In terms of visual perception, the author adopted the same pre-trained visual component as BLIP-2, which consists of ViT-G/14 and Q-Former of EVA-CLIP. MiniGPT-4 only adds a mapping layer to align the encoded visual features with the Vicuna language model, freezing all visual and language component parameters.

Introduction to MiniGPT-4

It has been more than a month since GPT-4 was released, but the image recognition function is still not available. Researchers from King Abdullah University of Science and Technology have launched a similar product - MiniGPT-4, which you can experience.

For humans, understanding the information of a picture is just a trivial matter. Humans can say the meaning of the picture without thinking. Just like the picture below, the charger that the phone is plugged into is somewhat inappropriate. Humans can see the problem at a glance, but for AI, it is still very difficult. 

GPT-4 can quickly point out the problem in the picture: VGA cable charging iPhone is not suitable.

In fact, the charm of GPT-4 is far less than this. What is even more exciting is to use hand-drawn sketches to directly generate a website, draw a rough schematic diagram on the draft paper, take a photo, and then send it to GPT-4, so that it can write the website code according to the schematic diagram , GPT-4 can quickly write the web page code.

But unfortunately, the function of GPT-4 is still not open to the public, and it is impossible to talk about it if you want to experience it. Someone can't wait, though, from King Abdullah

Guess you like

Origin blog.csdn.net/weixin_41259045/article/details/130320587