At the beginning of this year, the popularity of ChatGPT set off an upsurge in the development of large AI models around the world, and technology companies at home and abroad have joined the "Battle of 100 Models". After Baidu took the lead in releasing the first domestic artificial intelligence large language model "Wenxin Yiyan", it launched the Wenxin Qianfan large model platform to help enterprises and developers accelerate the implementation of large model applications.
Recently, Baidu founder, chairman and CEO Robin Li revealed the big news about Baidu World Conference 2023 at an event. He will "teach you step-by-step how to do AI" at the Baidu World Conference on October 17. native applications”.
I also had a sudden idea to use Wen Xinyiyan to build a small AI application: "Look at pictures and write poems". Just do it, and then follow the blogger to implement this web page!
Article directory
- 1. Implementation ideas
- 2. Steps to implement the web page of "AI Looks at Pictures and Writes Poems"
- 3. Future optimization
- 4. Summary
1. Implementation ideas
1. Design a web page to implement the function of accepting uploaded pictures and receiving Wen Xin Yi Yan Token.
2. Python calls the image recognition interface of Baidu Smart Cloud to identify the image category and content.
3. Python calls the Wen Xin Yi Yan interface, inputs the image category, and writes poems through Wen Xin Yi Yan
4. The Python backend returns the poem to the web page
2. Steps to implement the web page of "AI Looks at Pictures and Writes Poems"
2.1 Web front-end
The front end of the webpage uses HTML+CSS+JavaScript technology to implement the functions of uploading pictures, displaying pictures, passing in Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, Feijian Galaxy Access Token and click to write poems. The interface is as follows :
2.2 Image recognition
Baidu Intelligent Image Recognition Interface is an artificial intelligence service provided by Baidu, which can perform high-precision content recognition on pictures. The interface supports a variety of image recognition tasks, including general object recognition, scene recognition, text recognition, animal recognition, etc. Here We directly call the free universal object recognition image recognition interface through Python, which greatly improves the efficiency of development work.
2.2.1 Install Baidu Smart Cloud Python SDK
Baidu Smart Cloud Python SDK can be installed through pip. Enter the following command in the terminal:
pip install baidu-aip
Once installed, you can import the package in your Python code:
from aip import AipImageClassify
2.2.2 Create application
1. Log in to the official website of Baidu Smart Cloud:https://console.bce.baidu.com
2. Find and click Products 》Artificial Intelligence 》Image Recognition
3. After logging in to the console, click Free Trial:
4. Check all and click 0 yuan to receive:
5. After the creation is successful, click the application to enter the application details page, enter the application management menu, and click API Key to view the API Key and Secret Key, which are used by Python code to call the API.
2.2.3 Python code testing
After the above two steps of preparation, we can start writing Python code to implement Baidu intelligent image recognition. The following is a simple test example that requires modifying the AppId, API Key, Secret Key and image path:
from aip import AipImageClassify
# 定义百度智能云API的参数
APP_ID = '你的API ID'
API_KEY = '你的API Key'
SECRET_KEY = '你的 Secret Key'
# 实例化AipImageClassify
client = AipImageClassify(APP_ID, API_KEY, SECRET_KEY)
# 读取并设置图片路径
filePath = "1.png"
# 打开图片文件
with open(filePath, 'rb') as fp:
image = fp.read()
# 定义可选参数
options = {
"baike_num": 1}
# 调用图片标签识别接口
result = client.advancedGeneral(image, options)
# 输出结果
for res in result['result']:
print(res['keyword'], end=", ")
The picture I output is this landscape picture, which is still very challenging without text:
Recognition effect, all the information on the picture is recognized:
Next we need to pass the recognized image content to Wen Xinyiyan to write a poem.
2.3 Write poems with a literary heart
Here I provide a convenient and easy-to-use interface through ERNIE Bot SDK, which can call Wen Xinyiyan's capabilities, including text creation, universal dialogue, semantic vectors, Al drawing, etc., and can call 1 million Tokens for free:
2.3.1 Install ERNIE Bot SDK
ERNIE Bot SDK is a Python software development toolkit officially provided by Wenxin & Feipiao, referred to as EB SDK. It can be installed through the following pip command:
pip install erniebot
EB SDK authentication mainly involves setting up the backend and access token, which are specified through the api_type and access_token parameters respectively. The aistudio backend is used by default (api_type is aistudio). Copy the personal center token Token and fill it in the following code. (Replace {YOUR-ACCESS-TOKEN}):
2.3.2 Obtain Token
1. Open the official website of Feijian Galaxy Community:https://aistudio.baidu.com/cooperate/erniebotsdk
2. After registering your account, click to view:
3. Get your own Token and copy it. Later we will use Python to call the interface:
2.3.3 Python code testing
Let's separately test the ability to write poems by calling Wenxin Yiyan interface through ERNIE Bot SDK. The complete code is as follows (just replace it with your own Token):
import erniebot
if __name__ == '__main__':
# img_str,access_token需要传入
img_str = '树, 瀑布, 江河, 峡谷, 山峦' # 这里需要图片识别的内容信息
access_token = "这里替换为自己的TOken"
content = '根据'+img_str,'写10首两句七言诗'
erniebot.api_type = 'aistudio'
erniebot.access_token = access_token
response = erniebot.ChatCompletion.create(
model='ernie-bot',
messages=[{
'role': 'user', 'content': f"{
content}”"}],
)
print(response.result)
The running output is still very good:
Next, you only need to pass the generated poem to the web page for display.
2.4 Web backend
The following two Python libraries are used, execute the pip command to install them:
pip install fastapi
pip install pydantic
The back-end of the web page implements the function of receiving the images and tokens passed in by the front-end, then calling the image recognition interface of Baidu Smart Cloud and the Wenxin Yiyan large model interface, and then returning the generated poems to the front-end of the web page. The back-end code is as follows:
from fastapi import FastAPI, File, Form, UploadFile
from pydantic import BaseModel
from fastapi.middleware.cors import CORSMiddleware
import erniebot
from aip import AipImageClassify
app = FastAPI()
origins = [
"http://localhost.tiangolo.com",
"https://localhost.tiangolo.com",
"http://localhost",
"http://localhost:8000",
"http://localhost:63342",
# "http://127.0.0.1:8000"
]
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class PoeSimpleReq(BaseModel):
access_token: str
app_id: str
api_key: str
classify_secret_key: str
file: bytes = File(default=None)
@app.post("/poe")
async def read_root(access_token: str = Form(), app_id: str = Form(), api_key: str = Form(),
classify_secret_key: str = Form(), image: UploadFile = File()):
if access_token is None or app_id is None or api_key is None or classify_secret_key is None or classify_secret_key is None or image is None:
return "Supplementary information on the left"
access_token = access_token
APP_ID = app_id
API_KEY = api_key
CLASSIFY_SECRET_KEY = classify_secret_key
client = AipImageClassify(APP_ID, API_KEY, CLASSIFY_SECRET_KEY)
# 定义可选参数
options = {
"baike_num": 5}
# 调用图片标签识别接口
result = client.advancedGeneral(image.file.read(), options)
print("pic result:", result)
keyword_list = []
for res in result['result']:
keyword_list.append(res['keyword'])
keyword_string = ', '.join(keyword_list)
content = '根据' + keyword_string + '写10首两句七言诗'
erniebot.api_type = 'aistudio'
erniebot.access_token = access_token
response = erniebot.ChatCompletion.create(
model='ernie-bot',
messages=[{
'role': 'user', 'content': f"{
content}”"}],
)
data_str = response.result
return data_str.replace("\n", " ")
2.5 Complete project copy
The blogger has uploaded the complete source code of this project to Gitee as open source. Friends can download and modify the project by themselves:
https://gitee.com/xiaoyuan-itsuper /AI.git
2.6 Project operation steps
1. Prepare Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, and Feijian Galaxy Access Token in advance according to the above tutorial.
2. Download and copy the complete source code, create a virtual environment, and install dependency packages with pip
3. Enter the source code path, start the corresponding virtual environment, and enter the following command in cmd to start the code:
uvicorn main:app --reload
As shown in the picture:
4. Select the HTML code and click to open the web page from pycharm:
5. Open the main interface as shown below:
6. Click to select the picture and display the picture:
7. Fill in the Baidu Smart Cloud AppId, Baidu Smart Cloud API Key, Baidu Smart Cloud Secret Key, and Feijian Galaxy Access Token prepared according to the above tutorial (note: all four of these need to be filled in):
8. Click to generate a poem:
9. Wait for a while and the generation is completed:
OK, the complete project operation process is over here, why don’t you log in and experience it!
2.7 Operation process video
"AI Reading Pictures and Writing Poems" Project Operation Process
3. Future optimization
1. Optimize the UI design of the web homepage interface.
2. Enhance new functions, such as: drawing based on text, organizing documents, etc. through Wen Xinyiyan’s interface.
3. Website deployment and launch
If you have more ideas, please leave a message in the comment area and we will implement it together!
4. Summary
Through this small case, we learned how to implement the application of reading pictures and writing poems through AI. It mainly relies on the image recognition interface of Baidu Intelligent Cloud and the text generation capability of the Wen Xin Yi Yan large model. Of course, in addition to text generation, Wen Xin Yi Yan Large models can also generate images, audio and video functions, etc. These are very worthy of our continued exploration and application.
The Baidu World Conference with the theme of "Generating the Future" will be held in Beijing Shougang Park on October 17. In addition to Robin Li's on-site teaching "Teaching you step by step how to make AI native applications", Baidu World Conference 2023 will also bring large models, The latest progress in AI native applications, generative AI ecosystem, etc. Let us seize this opportunity of large model application together!
Share the live broadcast address to everyone:https://baiduworld.baidu.com/m/world/2023/#intro-title