I will guide you step by step in building the "AI Reading Pictures and Writing Poems" web page project using Python and Wen Xinyiyan (complete project source code attached)

At the beginning of this year, the popularity of ChatGPT set off an upsurge in the development of large AI models around the world, and technology companies at home and abroad have joined the "Battle of 100 Models". After Baidu took the lead in releasing the first domestic artificial intelligence large language model "Wenxin Yiyan", it launched the Wenxin Qianfan large model platform to help enterprises and developers accelerate the implementation of large model applications.

Recently, Baidu founder, chairman and CEO Robin Li revealed the big news about Baidu World Conference 2023 at an event. He will "teach you step-by-step how to do AI" at the Baidu World Conference on October 17. native applications”.

I also had a sudden idea to use Wen Xinyiyan to build a small AI application: "Look at pictures and write poems". Just do it, and then follow the blogger to implement this web page!


1. Implementation ideas

1. Design a web page to implement the function of accepting uploaded pictures and receiving Wen Xin Yi Yan Token.

2. Python calls the image recognition interface of Baidu Smart Cloud to identify the image category and content.

3. Python calls the Wen Xin Yi Yan interface, inputs the image category, and writes poems through Wen Xin Yi Yan

4. The Python backend returns the poem to the web page

2. Steps to implement the web page of "AI Looks at Pictures and Writes Poems"

2.1 Web front-end

The front end of the webpage uses HTML+CSS+JavaScript technology to implement the functions of uploading pictures, displaying pictures, passing in Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, Feijian Galaxy Access Token and click to write poems. The interface is as follows :

2.2 Image recognition

Baidu Intelligent Image Recognition Interface is an artificial intelligence service provided by Baidu, which can perform high-precision content recognition on pictures. The interface supports a variety of image recognition tasks, including general object recognition, scene recognition, text recognition, animal recognition, etc. Here We directly call the free universal object recognition image recognition interface through Python, which greatly improves the efficiency of development work.

2.2.1 Install Baidu Smart Cloud Python SDK

Baidu Smart Cloud Python SDK can be installed through pip. Enter the following command in the terminal:

pip install baidu-aip 

Once installed, you can import the package in your Python code:

from aip import AipImageClassify

2.2.2 Create application

1. Log in to the official website of Baidu Smart Cloud:https://console.bce.baidu.com

2. Find and click Products 》Artificial Intelligence 》Image Recognition

3. After logging in to the console, click Free Trial:

4. Check all and click 0 yuan to receive:

5. After the creation is successful, click the application to enter the application details page, enter the application management menu, and click API Key to view the API Key and Secret Key, which are used by Python code to call the API.

Insert image description here

2.2.3 Python code testing

After the above two steps of preparation, we can start writing Python code to implement Baidu intelligent image recognition. The following is a simple test example that requires modifying the AppId, API Key, Secret Key and image path:

from aip import AipImageClassify

# 定义百度智能云API的参数
APP_ID = '你的API ID'
API_KEY = '你的API Key'
SECRET_KEY = '你的 Secret Key'

# 实例化AipImageClassify
client = AipImageClassify(APP_ID, API_KEY, SECRET_KEY)

# 读取并设置图片路径
filePath = "1.png"

# 打开图片文件
with open(filePath, 'rb') as fp:
   image = fp.read()

# 定义可选参数
options = {
    
    "baike_num": 1}

# 调用图片标签识别接口
result = client.advancedGeneral(image, options)

# 输出结果
for res in result['result']:
   print(res['keyword'], end=", ")

The picture I output is this landscape picture, which is still very challenging without text:

Recognition effect, all the information on the picture is recognized:

Insert image description here

Next we need to pass the recognized image content to Wen Xinyiyan to write a poem.

2.3 Write poems with a literary heart

Here I provide a convenient and easy-to-use interface through ERNIE Bot SDK, which can call Wen Xinyiyan's capabilities, including text creation, universal dialogue, semantic vectors, Al drawing, etc., and can call 1 million Tokens for free:

2.3.1 Install ERNIE Bot SDK

ERNIE Bot SDK is a Python software development toolkit officially provided by Wenxin & Feipiao, referred to as EB SDK. It can be installed through the following pip command:

pip install erniebot

EB SDK authentication mainly involves setting up the backend and access token, which are specified through the api_type and access_token parameters respectively. The aistudio backend is used by default (api_type is aistudio). Copy the personal center token Token and fill it in the following code. (Replace {YOUR-ACCESS-TOKEN}):

2.3.2 Obtain Token

1. Open the official website of Feijian Galaxy Community:https://aistudio.baidu.com/cooperate/erniebotsdk

2. After registering your account, click to view:

3. Get your own Token and copy it. Later we will use Python to call the interface:

2.3.3 Python code testing

Let's separately test the ability to write poems by calling Wenxin Yiyan interface through ERNIE Bot SDK. The complete code is as follows (just replace it with your own Token):

import erniebot


if __name__ == '__main__':
   # img_str,access_token需要传入
   img_str = '树, 瀑布, 江河, 峡谷, 山峦' # 这里需要图片识别的内容信息
   access_token = "这里替换为自己的TOken"
   content = '根据'+img_str,'写10首两句七言诗'

   erniebot.api_type = 'aistudio'
   erniebot.access_token = access_token
   response = erniebot.ChatCompletion.create(
       model='ernie-bot',
       messages=[{
    
    'role': 'user', 'content': f"{
      
      content}”"}],
   )
   print(response.result)

The running output is still very good:

Insert image description here

Next, you only need to pass the generated poem to the web page for display.

2.4 Web backend

The following two Python libraries are used, execute the pip command to install them:

pip install fastapi
pip install pydantic

The back-end of the web page implements the function of receiving the images and tokens passed in by the front-end, then calling the image recognition interface of Baidu Smart Cloud and the Wenxin Yiyan large model interface, and then returning the generated poems to the front-end of the web page. The back-end code is as follows:

from fastapi import FastAPI, File, Form, UploadFile
from pydantic import BaseModel
from fastapi.middleware.cors import CORSMiddleware
import erniebot
from aip import AipImageClassify

app = FastAPI()
origins = [
   "http://localhost.tiangolo.com",
   "https://localhost.tiangolo.com",
   "http://localhost",
   "http://localhost:8000",
   "http://localhost:63342",
   # "http://127.0.0.1:8000"
]
app.add_middleware(
   CORSMiddleware,
   allow_origins=origins,
   allow_credentials=True,
   allow_methods=["*"],
   allow_headers=["*"],
)
class PoeSimpleReq(BaseModel):
   access_token: str
   app_id: str
   api_key: str
   classify_secret_key: str
   file: bytes = File(default=None)

@app.post("/poe")
async def read_root(access_token: str = Form(), app_id: str = Form(), api_key: str = Form(),
                   classify_secret_key: str = Form(), image: UploadFile = File()):
   if access_token is None or app_id is None or api_key is None or classify_secret_key is None or classify_secret_key is None or image is None:
       return "Supplementary information on the left"
   access_token = access_token

   APP_ID = app_id
   API_KEY = api_key
   CLASSIFY_SECRET_KEY = classify_secret_key

   client = AipImageClassify(APP_ID, API_KEY, CLASSIFY_SECRET_KEY)
   # 定义可选参数
   options = {
    
    "baike_num": 5}
   # 调用图片标签识别接口
   result = client.advancedGeneral(image.file.read(), options)
   print("pic result:", result)
   keyword_list = []

   for res in result['result']:
       keyword_list.append(res['keyword'])
   keyword_string = ', '.join(keyword_list)
   content = '根据' + keyword_string + '写10首两句七言诗'
   erniebot.api_type = 'aistudio'
   erniebot.access_token = access_token
   response = erniebot.ChatCompletion.create(
       model='ernie-bot',
       messages=[{
    
    'role': 'user', 'content': f"{
      
      content}”"}],
   )

   data_str = response.result
   return data_str.replace("\n", "  ")

2.5 Complete project copy

The blogger has uploaded the complete source code of this project to Gitee as open source. Friends can download and modify the project by themselves:
https://gitee.com/xiaoyuan-itsuper /AI.git

2.6 Project operation steps

1. Prepare Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, and Feijian Galaxy Access Token in advance according to the above tutorial.

2. Download and copy the complete source code, create a virtual environment, and install dependency packages with pip

3. Enter the source code path, start the corresponding virtual environment, and enter the following command in cmd to start the code:

uvicorn main:app --reload

As shown in the picture:

Insert image description here

4. Select the HTML code and click to open the web page from pycharm:

Insert image description here

5. Open the main interface as shown below:

Insert image description here

6. Click to select the picture and display the picture:

7. Fill in the Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, and Feijian Galaxy Access Token prepared according to the above tutorial (note: all four of these need to be filled in):

8. Click to generate a poem:

Insert image description here

9. Wait for a while and the generation is completed:

Insert image description here

OK, the complete project operation process is over here, why don’t you log in and experience it!

2.7 Operation process video

"AI Reading Pictures and Writing Poems" Project Operation Process

3. Future optimization

1. Optimize the UI design of the web homepage interface.

2. Enhance new functions, such as: drawing based on text, organizing documents, etc. through Wen Xinyiyan’s interface.

3. Website deployment and launch

If you have more ideas, please leave a message in the comment area and we will implement it together!

4. Summary

Through this small case, we learned how to implement the application of reading pictures and writing poems through AI. It mainly relies on the image recognition interface of Baidu Intelligent Cloud and the text generation capability of the Wen Xin Yi Yan large model. Of course, in addition to text generation, Wen Xin Yi Yan Large models can also generate images, audio and video functions, etc. These are very worthy of our continued exploration and application.

The Baidu World Conference with the theme of "Generating the Future" will be held in Beijing Shougang Park on October 17. In addition to Robin Li's on-site teaching "Teaching you step by step how to make AI native applications", Baidu World Conference 2023 will also bring large models, The latest progress in AI native applications, generative AI ecosystem, etc. Let us seize this opportunity of large model application together!

Share the live broadcast address to everyone:https://baiduworld.baidu.com/m/world/2023/#intro-title

Guess you like

Origin blog.csdn.net/yuan2019035055/article/details/133817699