MiniGPT-4, an AI tool that can recognize images, is open source and free, and realizes functions that GPT4 does not have.

We’ve seen the multi-modal capabilities of GPT-4 before, and it was simply amazing. In particular, it can create a complete website through concise draft drawings, allowing us to feel the powerful multi-modal model function of GPT-4 in advance.

About MiniGPT-4

Now, a project called Mini GPT-4 has been open sourced with the efforts of several PhDs from King Abdullah University of Science and Technology. It provides image analysis capabilities similar to GPT-4 and has attracted tens of millions of attention in just a few days.

af434d39d91946a28e0bdaeb1e720f8f.jpeg

So what are its capabilities? Let’s take a look at the official video. It can easily identify spots on a leaf, determine that the leaf is sick, and provide solutions. It can see that a picture of a cactus shows a scene that cannot happen in the real world because cacti do not grow in such a cold environment.

8d9cca7a57180514b7d5e688e3df1f96.jpeg

It can also write advertisements for the products in the pictures and provide cooking methods for delicious food. Even more amazing is that it can even create an entire website with just a few lines of instructions. It’s amazing, we have to try it!

How to use MiniGPT-4

Open the demo page of mini GPT-4, upload a picture in the left window, like this one, and then click Upload&Start Chat

Then write the command in the right User  window and ask him to help us describe this picture. , what’s interesting about it?

5f9abbeab677fcbb817bb1a952b047a3.jpeg

After waiting for a while, he gave his answer:

This image shows a dog lying on the floor with his paws in his shoes. The dog's expression is curious and playful, as if he's trying to figure out what the shoe is for. This scene is interesting because it shows the dog's playful nature and curiosity about shoes.

717f239a236f7dbd0d6174868b29b8d5.jpeg

Most of the description is accurate, but with the white shirt and blue jeans, you will find that he is really not good at human dressing, and he can also support Chinese. Then we asked him to write a story using a picture, and he wrote it quickly.

7d3c68f49b1ac6798d688aee4f39488b.jpeg

MiniGPT-4 project features

  • The first is multimodality, which means being able to understand pictures. In this example, it can answer what the picture is about, how many colors it has, and even what style the picture belongs to.
  • The second lowest cost is the A100, which only costs 4 yuan and only lasts 10 hours of training. It can definitely be called a mini model.
  • The third point is that the entire project is open source. The Github address of the project is https://github.com/Vision-CAIR/MiniGPT-4. This project is also very conscientious and provides 7 demonstration addresses for everyone to experience.

Summarize

According to the experimental results of MiniGPT4, the advanced capabilities of GPT4 can theoretically be attributed to the fact that it uses a more advanced large model language. In other words, in the future, in the fields of images, sounds, videos, etc., the products made based on these large models will Application, the actual effect should not be too bad.

This project has also confirmed the feasibility of large language models in the image field. Next, I believe there will be many developers rushing to join the field to further extend the capabilities of GPT4 to audio and video and other fields. Let us You can see more interesting and amazing AI applications. Well, that’s it for today’s sharing.

Guess you like

Origin blog.csdn.net/weixin_43938890/article/details/130316794