Baidu AI model "Wen Xin Yi Yan" fresh experience

Today I received a notification that I can experience Baidu's AI model "Wen Xin Yi Yan" . I have been waiting for more than a month and can't wait to experience it. The following is the relevant record of the experience.


1. Brief introduction

Enter through the official website link https://yiyan.baidu.com/, and see the following interface:

insert image description here

In Wenxin Yiyan's self-introduction, as an artificial intelligence language model, he is good at Chinese and English, follows the test rules, but does not know everything, and is still learning and growing.

The knowledge-enhanced large language model developed by Baidu, the Chinese name is Wenxin Yiyan, and the English name is ERNIE Bot. I can interact with people, answer questions, assist in creation, and help people obtain information, knowledge and inspiration efficiently and conveniently.

1.1 The model algorithm of Wenxin Yiyan

The algorithm of Wenxin Yiyan model includes:

  • Pretrained language model. It is obtained through large-scale machine learning algorithms and training using massive Internet text data.
  • Natural language processing technology. Including Chinese word segmentation, part-of-speech tagging, named entity recognition, entity relationship extraction and other technologies.
  • Knowledge enhancement. Interpret and generate new text using pre-trained language models and domain expert knowledge.

In general, Wenxin Yiyan is a knowledge-enhanced large language model developed by Baidu relying on the technology of Flying Paddle and Wenxin Large Model, which can interact with people, answer questions, assist in creation, and help people obtain information and knowledge efficiently and conveniently. and inspiration.

1.2 The training data set of Wenxin Yiyan

The training data set of Wenxinyiyan model includes:

  • Webpage data: including webpage crawling data of search engines and webpage crawling data of public network resources, such as news websites, encyclopedias, Wikipedia, etc.
  • Search data: including users' search requests and search results on search engines, such as Baidu search.
  • Image data: including images uploaded by users and image resources of search engines, such as image search results.
  • Voice data: including users' voice search requests and search results, such as voice search engines.
  • Knowledge augmentation data: including user-input knowledge and expert knowledge, such as user-input questions and answers from domain experts.

These data come from public resources on the Internet and have been trained on a large scale through pre-trained models. The Wenxin Yiyan model can understand and generate texts in natural language, Chinese and multiple foreign languages, and can perform tasks such as knowledge reasoning and question answering.

1.3 Image generation algorithm based on Wenxin Yiyan

The image generation algorithm of Wenxin Yiyan mainly includes the following types:

  1. Convolutional neural network based on deep learning: The image generation part of the Wenxin Yiyan model uses a convolutional neural network to generate images similar to the input image by learning the feature mapping relationship between the input image and the generated image.
  2. Generative confrontation network: The generator of the Wenxin Yiyan model uses a generative confrontation network to generate images through the interaction of the generator and the discriminator. Generative confrontation network can learn the feature mapping relationship between the input image and the generated image, and generate an image similar to the input image through the generation of the generator.
  3. Variational Autoencoder: The Variational Autoencoder of the Wenxin Yiyan model uses a variational autoencoder to learn the feature mapping relationship between the input image and the generated image. Variational autoencoders can generate images similar to the input image by learning the residual between the input image and the generated image.
  4. Other variant models: In addition to the above-mentioned models, there are also some variant models, such as models based on VAE and GATCN, which can combine different image generation tasks and characteristics to select the appropriate model for image generation.

In general, the image generation algorithm of Wenxinyiyan mainly includes deep learning, generative confrontation network and variational autoencoder. These algorithms can combine different tasks and characteristics to select appropriate models for image generation to achieve more efficient and accurate image generation.

1.4 The disadvantages and advantages of Wenxinyiyan compared with ChatGPT

When she first started answering this question, Wen Xin answered it in a very general way, without directly answering the question.

insert image description here

When asked about the advantages and disadvantages of Wenxin Yiyan and ChatGPT-4, the advantages and disadvantages of both are listed.

insert image description here

I asked several similar questions in a row, all of which were the answers above. I strongly suspect that this question was manually marked.

2. A few simple questions

2.1 Word processing and literary creation

1) Write a report on the development of the data economy

insert image description here

Wen Xin gave an outline in one word, as follows:

数字经济是指以数字技术为基础,通过数字化的手段和方式来推动经济发展的新型经济形态。数字经济的发展具有广阔的前景和潜力,对于推动实体经济的转型升级和提高国家竞争力都具有重要的作用。
数字经济的发展背景:
随着信息技术的不断发展,数字经济逐渐成为了推动经济发展的重要动力。数字经济的发展与信息技术密不可分,尤其是互联网和移动通信技术的快速发展,为数字经济的发展提供了坚实的基础。同时,数字经济的发展也受到政策环境的支持和市场需求的拉动。
数字经济的发展现状:
目前,数字经济已经成为了全球经济发展的重要趋势。数字经济的发展主要表现在以下几个方面:
第一,数字技术的应用不断深入。数字技术在各个领域的应用不断深入,包括金融、医疗、教育等传统行业,也包括制造业、农业、物流等新兴行业。
第二,数字经济的市场规模不断扩大。数字经济的市场规模不断扩大,涵盖了数字化服务、电子商务、数字娱乐等多个领域。
第三,数字经济的产业链结构不断完善。数字经济的产业链结构不断完善,数字技术的应用不断深入到产业链的各个环节,推动了数字经济的发展。
数字经济的发展存在问题:
虽然数字经济的发展具有广阔的前景和潜力,但是也存在一些问题和挑战。
第一,数字经济的发展不平衡。不同地区、不同行业的数字经济发展水平存在差异,发展不平衡的问题也逐渐显现。
第二,数字经济的安全问题。数字经济的发展需要保障数字资产的安全性和隐私性,但是数字经济的发展过程中也存在着数字资产被盗窃、滥用等安全问题。
第三,数字经济的人才问题。数字经济的发展需要具有数字化、信息化、创新能力的人才,但是数字经济的发展过程中也存在着人才缺乏的问题。
数字经济的发展趋势:
未来,数字经济的发展趋势将会更加多元化和深入化。数字技术的应用将会更加广泛,数字经济的市场规模将会进一步扩大,数字经济的产业链结构将会更加完善,数字经济的发展将会更加注重创新和人才的培养。同时,数字经济的发展也需要面对信息安全、数字资产保护等问题的挑战。
数字经济的发展建议:
为了推动数字经济的发展,我们需要采取一系列的措施。
第一,加强数字经济的政策环境建设。政府应该制定更加积极的数字经济发展政策,加强数字经济领域的规划和引导,推动数字经济的健康发展。
第二,加强数字经济的产业链结构建设。政府和企业应该加强数字技术的应用,推动数字经济产业链的完善,提高数

2) Make a PPT about the development of the digital economy

insert image description here

Wen Xinyiyan gives a rough outline template

标题页:
数字经济的发展
第一页:
标题:数字经济的发展
副标题:数字经济的广阔前景和潜力
第二页:
图表:数字经济的市场规模
图表:数字经济的产业链结构
文字:数字经济的发展受到政策环境和市场需求的支持
第三页:
图表:数字经济的发展现状
文字:数字经济已经成为全球经济发展的重要趋势
第四页:
图表:数字经济的安全问题
文字:数字经济的发展需要保障数字资产的安全性和隐私性
第五页:
图表:数字经济的人才问题
文字:数字经济的发展需要具有数字化、信息化、创新能力的人才
第六页:
图表:数字经济的发展趋势
文字:未来数字经济的发展将会更加多元化和深入化
第七页:
图表:数字经济的发展建议
文字:政府、企业和个人应该共同推动数字经济的发展
结尾页:
标题:数字经济的发展
副标题:数字经济的发展是推动实体经济转型升级和提高国家竞争力的重要手段
文字:让我们共同努力,推动数字经济的发展,为实现数字化的中国梦贡献力量。

For the answers to these specific topics, Wen Xin Yi Yan's understanding and inductive ability are still good.

3) One composition of Guangdong's 2022 college entrance examination composition "Proficient, Skilled, and Common" with the answer given by Wen Xin

insert image description here

标题:本手、妙手、俗手:围棋三种不同的下法

在围棋世界里,本手、妙手、俗手是三个非常重要的术语。它们描述了围棋下法中的三个不同层次,对于围棋的学习和提高都具有重要的意义。

本手是指合乎棋理的正规下法,是围棋的基础,是每个围棋爱好者都应该掌握的技能。本手的学习需要扎实的基本功,只有通过不断的练习和思考,才能逐渐掌握。在围棋中,本手是基础中的基础,只有掌握了本手,才能在围棋的道路上走得更加稳健。

妙手是指出人意料的精妙下法,是围棋的高级技巧,需要经过长期的研究和实践才能掌握。妙手不仅需要扎实的基本功,还需要有超凡的创造力和想象力。在围棋中,妙手是创造力的体现,是一种创新和突破,它能够让人眼前一亮,给对手带来意想不到的打击。

俗手是指貌似合理,而从全局看通常会受损的下法。俗手虽然看似合理,但是从整体上看,会给全局带来损失,影响整个棋局的平衡。在围棋中,俗手是需要避免的,它会影响棋手的水平和信心,甚至会导致整个棋局的失败。

在学习围棋的过程中,我们应该注意到本手和妙手的重要性,但也不能忽视俗手的存在。只有通过深入的理解和不断的实践,才能真正掌握围棋的精髓,成为一名出色的围棋爱好者。
在围棋中,本手是基础,妙手是创造,俗手是需要避免的。只有通过不断的学习和实践,才能真正掌握围棋的技巧,成为一名出色的围棋爱好者。让我们一起努力,不断学习和探索,追求更高的围棋境界。

最后,我想引用一句话来总结本文的内容:“本手是基础,妙手是创造,俗手是需要避免的。”让我们在围棋的学习和实践中,始终牢记这句话,不断追求更高的围棋境界。

Ask Wen Xin to rate himself, but the answer is very honest, getting about 40 points, and only introduces three terms, without his own perception and thinking.

insert image description here

Compared with the answer given by ChatGPT-3.5 before, it is also talking about Go, without jumping out to think and understand. In this regard, the AI ​​model is still lacking.

标题:扎实的基础是必不可少的

学习一项技能,我们必须从基础开始。围棋,这项神秘又古老的游戏,对于初学者来说,先要掌握“本手”。

“本手”是指合乎棋理的正规下法。它是一种经过深思熟虑,按照规则进行操作的方法。在学习围棋时,我们要深入理解“本手”,仔细研究它的精髓,并在实践中持之以恒地练习。只有在“本手”的基础上,我们才能进一步提高棋力。

然而,有些初学者却不愿意专注于“本手”。他们更喜欢挑战“妙手”,这种出人意料的精妙下法。“妙手”可能会带来惊喜和快乐,但是,它往往是“本手”的延伸。如果我们不扎实地掌握“本手”,就很难想象会出现什么“妙手”。

另外,如果不坚持“本手”,我们就很可能会走上一条不归路——“俗手”。“俗手”是指貌似合理,而从全局看通常会受损的下法。它看起来似乎有用,但实际上并不可取。如果我们不明确“本手”的价值,就容易掉入“俗手”的陷阱。这样,我们的棋力不仅不会提高,反而会下降。

因此,在学习围棋时,我们应该重视“本手”的功夫。“本手”是基础,它不仅能帮助我们在比赛中取得好成绩,而且还能为我们创造出“妙手”。只有扎实的基础,我们才能绽放出自己独特的光芒。

学习“本手”的过程中,我们需要专注、认真、耐心。这需要我们投入大量的时间和精力。同时,我们也需要严格按照“本手”的要求去操作,不断地练习、反复练习。通过不断练习,我们才能掌握“本手”,并将它融入我们的比赛。
“本手”也能为我们提供指导,让我们更好地理解围棋的规则和奥妙。它能帮助我们建立良好的棋风,让我们的棋艺更加优秀。

在实践中,我们可以通过多接触不同的“本手”,从中学习经验和技巧。例如,我们可以和更高水平的选手比赛,从他们身上学习“本手”的精髓。我们也可以阅读围棋教程,从中获取“本手”的相关知识。通过不断学习和实践,我们的“本手”水平一定会有所提高。

总之,“本手”是围棋学习中不可或缺的。我们要珍惜它,并努力掌握它。只有这样,我们才能真正成为一名优秀的围棋选手。
2.2 Mathematical logic and reckoning

1) 5 years ago, the age of the mother was 7 times that of the son. The son is 9 years old this year. How old is the mother?

insert image description here

For such a simple question, the logic is wrong, and another similar question can be deduced. It feels like manual annotation has been done in training.

insert image description here

2) Give the derivation method of Fourier transform

Wen Xin Yiyan gave two derivation methods: Fourier transform based on complex variable function and Fourier transform based on Fourier series

insert image description here

2.3 Coding ability

1) Implement the sorting algorithm with a Python program

Wenxin Yiyan gives five algorithms: bubble sort, selection sort, insertion sort, quick sort, merge sort

以下是Python中常用的排序算法:
#冒泡排序
def bubble_sort(arr):  
    n = len(arr)  
    for i in range(n):  
        for j in range(0, n-i-1):  
            if arr[j] > arr[j+1]:  
                arr[j], arr[j+1] = arr[j+1], arr[j]  
    return arr
#选择排序
def selection_sort(arr):  
    n = len(arr)  
    for i in range(n):  
        min_idx = i  
        for j in range(i+1, n):  
            if arr[j] < arr[min_idx]:  
                min_idx = j  
        arr[i], arr[min_idx] = arr[min_idx], arr[i]  
    return arr
#插入排序
def insertion_sort(arr):  
    n = len(arr)  
    for i in range(1, n):  
        key = arr[i]  
        j = i - 1  
        while j >= 0 and arr[j] > key:  
            arr[j+1] = arr[j]  
            j -= 1  
        arr[j+1] = key  
    return arr
#快速排序
def quick_sort(arr):  
    if len(arr) <= 1:  
        return arr  
    else:  
        pivot = arr[0]  
        left = []  
        right = []  
        for i in range(1, len(arr)):  
            if arr[i] < pivot:  
                left.append(arr[i])  
            else:  
                right.append(arr[i])  
        return quick_sort(left) + [pivot] + quick_sort(right)
#归并排序
def merge_sort(arr):  
    if len(arr) <= 1:  
        return arr  
    else:  
        mid = len(arr) // 2  
        left = arr[:mid]  
        right = arr[mid:]  
        return merge_sort(left) + merge_sort(right)
这些算法都是基于比较和交换的,可以根据具体情况选择使用。

2) Use the Python program to process the CPU data of the server, and use echart to generate charts

insert image description here

The complete code is as follows:

要使用Python程序处理服务器的CPU数据并使用ECharts生成图表,您需要使用Python的socket模块和pyecharts模块。
首先,您需要安装socket模块和pyecharts模块。您可以使用以下命令在终端中安装它们:
pip install socket socket-ssl  
pip install pyecharts
接下来,您需要编写一个Python程序来监听服务器的CPU使用情况。这个程序应该在服务器启动时运行,并且应该在每秒钟检查一次CPU使用情况。您可以使用socket.getrusage()函数来获取CPU使用情况。
import socket  
import time  
  
def cpu_usage():  
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)  
    sock.bind(('localhost', 8000))  
    sock.listen(1)  
    while True:  
        conn, addr = sock.accept()  
        try:  
            while True:  
                data = conn.recv(1024)  
                if not data:  
                    break  
                conn.sendall(data)  
                time.sleep(0.1)  
        finally:  
            conn.close()  
  
cpu_usage()

在这个程序中,cpu_usage()函数将在每秒钟检查一次CPU使用情况。它使用socket.getrusage()函数来获取CPU使用情况,并将其发送回客户端。
接下来,您需要使用pyecharts模块来生成图表。您可以使用以下命令在终端中安装它:
pip install pyecharts
然后,您需要编写一个Python程序来将CPU使用情况和其他数据发送回ECharts。您可以使用pyecharts.charts模块来创建图表。
from pyecharts import charts  
from pyecharts.charts import Line  
  
cpu_data = []  
cpu_chart = charts.Line()  
cpu_chart.add('cpu_usage', cpu_data)  
cpu_chart.render('cpu_usage.html')
在这个程序中,cpu_data是一个包含CPU使用情况的列表。cpu_chart是一个ECharts图表对象,它将cpu_data添加到图表中。最后,cpu_chart.render()函数将图表保存到名为cpu_usage.html的文件中。
请注意,这只是一个简单的示例程序。您需要根据您的实际需求进行修改和调整。
2.4 Image generation

Wen Xin's words before drawing pictures were ridiculed by the whole network. For example, the couple's lungs and the heavy traffic were completely out of the name, and they didn't know what to say.

1) Draw a picture: heavy traffic. Compared with before, it is a lot of progress

insert image description here
insert image description here

Compared with before, it has indeed improved a lot.

2) Comprehension of Chinese semantics, draw a picture "Sunset clouds and lone ducks fly together, the autumn waters are long and the sky is the same color"

insert image description here

2.5 Professional domain knowledge

1) What are the idempotent design and implementation methods of distributed interfaces?

  1. The idempotent implementation method based on atomic operations: Atomic operations refer to that in a distributed system, for any operation, the failure of a certain node will not cause the entire system to be unavailable. The idempotence implementation method based on atomic operations can ensure the idempotency of the interface through atomic operations.
  2. Implementation method of idempotence based on distributed locks: distributed locks are a mechanism for ensuring mutual exclusive access to shared resources among multiple nodes in a distributed system. The idempotency implementation method based on distributed locks can ensure the idempotency of the interface by using distributed locks.
  3. Implementation method of idempotence based on message queue: message queue is a mechanism for asynchronously delivering messages in a distributed system. Based on the idempotency implementation method of message queue, the idempotence of interface can be guaranteed by using message queue.
  4. Idempotency implementation method based on data replication: Data replication is a mechanism for synchronizing data in a distributed system. The idempotence implementation method based on data replication can ensure the idempotency of the interface by using data replication.

The above are some common methods for the design and implementation of idempotence in distributed interfaces. Which method to choose should be weighed and selected according to specific business scenarios and requirements.

insert image description here

2) What are the schools of Chinese medicine since ancient times? Which schools are closest to the essence of Chinese medicine?

insert image description here

3) What is the development status of domestic localized databases? What are the representative database products

insert image description here

Of course, as a professional field, I am not satisfied with the above answers.

3. Summary

After experiencing Baidu's Wenxin Yiyan model, I have a few feelings:

  1. The function of drawing pictures has been improved qualitatively, and there are fewer cases where the bull's head is wrong with the horse's mouth, which shows that it is not a simple translation from Chinese to English to call the API interface
  2. The ability of mathematical logic still needs to be improved, and some simple mathematical reasoning makes low-level mistakes
  3. In terms of text processing and creation, rich emotional and ideological connotations are still beyond the reach of AI
  4. Accuracy in areas of expertise to be discerned
  5. Some sensitive topics are blocked, for example, if I ask about the 20th National Congress, this topic will end directly

References:

  1. https://yiyan.baidu.com/
  2. ChatGPT experience of the strongest AI model on the surface

Guess you like

Origin blog.csdn.net/solihawk/article/details/130356813