Philosophical Talk on Natural Language Models

Recently, the performance of large language models represented by chatGPT has been amazing. "In Context Learning", "Instruct" 1 , and reasoning ability, it is hard not to let us look forward to the development of artificial intelligence in the future, and at the same time calmly think about why the natural language model can make great progress.


How do we think? What activities do humans do without opening their mouths? In other words, among our complex thinking activities, which ones are innate and which ones are learned? How exactly does thinking work? What is language?

1 Empty your brain and start from 0

Try to empty your mind, think of nothing, and regard this as the 0 state (initial state) of thinking. At this time, you will find yourself unconsciously noticing such inconspicuous changes as light, sound, and the folds of clothes. Even in a completely dark and quiet place, you can still feel your heartbeat and breathing. The so-called empty state can be understood as the brain is in a state of inert neglect of the input of the sensor (sensory organ), that is, "not the brain".

In this state, no letter or word "jumps" out of the mind's eye. If the control is not good, it may simply pop up some very intuitive associated words, such as words such as "hot" and "dazzling". At this moment, we are basically indistinguishable from ordinary vertebrates. The neural activity corresponding to the most intuitive feeling of the surrounding environment should have evolved earlier in animals, even dating back to the insects of the Triassic period. Let's start with the comparison of primitive single-celled organisms.

In middle school, I knew that single-celled organisms can have physical and chemical differences due to differences in temperature and light on both sides of the culture medium, and the flagella will deflect, allowing themselves to swim to warmer places. But this activity is not neural activity, but an instinct evolved by natural selection (individuals that cannot deflect flagella are eliminated). This physical and chemical structure is determined by the DNA and RNA that control protein synthesis, and it continues to improve with the iteration of life.

picture
(The picture comes from the Internet)
Evolution that relies on mutation may be very fast for single-celled organisms and even viruses. For example, in the case of overuse of antibiotics, superbugs may only be produced for a short period of time. But for more complex organisms, it may be slower. This is actually a simple probability problem. When species become more and more advanced along the evolutionary map, they become more and more complex. It can be understood that the more modules that make up the system, the higher the coupling between modules. Only when the mutation does not destroy the original function, but only brings benefits, the individual can survive. But without directional variation, it is very difficult for a variation that "just" meets the needs of natural selection to appear, or it takes a longer time scale to accumulate.

Going back to single-celled organisms, we can understand that the behavior of changing individual activities caused by the perception of a certain physical and chemical parameter is fixed and definite.

作用
发生
环境刺激
生物材质
伸缩偏转

The ability of single-celled organisms to adapt to the surrounding environment depends on DNA-directed synthesis of specific protein properties. If we implement this small cell with a Raspberry Pi, it should look like this:

读取
[L,R]=F( n1,n2,n3,n4)
读取
读取
读取
光敏传感器n1
树莓派
左右步进电机组
光敏传感器n2
光敏传感器n3
光敏传感器n4
raspberry pie (https://img-blog.csdnimg.cn/2a575c90f3b849bcbf66ebac5987bb54.png)

Write a function, input four photosensitive values, and output the number of left and right steps of the stepping motor. We hand over this task to Yuanruyi's C know to complete (GPT2):

QUESTION:
我的树莓派小车连接了4个感光传感器,每个传感器会返回一个亮度值。
这个亮度值范围是0到64,0表示完全没有光,64表示非常亮的光。
四个传感器分别安装在东南西北四个方向。
小车的活动是靠左右轮子来控制的,两个步进电机可以控制轮子正传、反转。
步进电机输出值是一个整数,取值只能是-1,0,1中的一个。取值-1时,反转10度;取值为0时,电机停止;取值为1时,电机正传10度。
控制程序每秒会调用一个函数F, F的输入值为四个传感器的亮度,分别为 l1,l2,l3,l4; 函数输出为左右两个电机的步进,取值为 n1,n2.请帮我实现这个F函数,使得小车能够朝着光最亮的地方行走。

C knows the answer:

#这个问题是一个编程问题,可以直接回答。
#可以通过比较四个传感器返回的亮度值来确定光最亮的方向,然后控制左右轮子的运动方向使得小车朝向光最亮的地方行走。
#具体实现方法如下:
#1.读取四个传感器的亮度值l1,l2,l3,l4。
#2.比较四个亮度值,找出最大值所对应的方向dir,比如如果l1是最大值,那么dir为东方向。如果最大值不止一个,可以随机选择一个方向。
#3.根据方向dir计算左右两个电机的步进值n1,n2:
#如果dir为东方向,那么右轮向前转,左轮向后转。此时n1为1,n2为-1。
#如果dir为南方向,那么左轮向前转,右轮向后转。此时n1为-1,n2为1。
#如果dir为西方向,那么左轮向前转,右轮向后转。此时n1为-1,n2为1。
#如果dir为北方向,那么右轮向前转,左轮向后转。此时n1为1,n2为-1。
#4.返回计算出的左右电机步进值n1,n2。
#代码实现如下:

def F(l1, l2, l3, l4):
    dir = ['east', 'south', 'west', 'north'][[l1, l2, l3, l4].index(max(l1, l2, l3, l4))]
    if dir == 'east':
        n1, n2 = 1, -1
    elif dir == 'south':
        n1, n2 = -1, 1
    elif dir == 'west':
        n1, n2 = -1, 1
    else:
        n1, n2 = 1, -1
    return n1, n2

Although this code is not an optimal solution, for example, four values ​​can be used for interpolation to determine the brightest direction of the light field, and then control it. However, it is basically logical programming.

C knows
The DNA of single-cell life will control the generation of a special protein structure, which has exactly the same effect as the above code. This structure is determined by natural selection and is not a type of intelligence. In fact, we can also make such a differential controller using analog devices such as transistors, capacitors, resistors, and inductors. Realization of this function does not require the participation of digital circuits and computers.

2 simple neural activity

Primitive multicellular organisms were more complex, finding prey with a primitive sense of smell and touch. They are largely incapable of learning on an individual scale. Their ability to adapt to nature is accomplished by natural selection and genetic mutation, which may span many generations of individuals. But multicellular organisms have evolved primitive neural activities, such as precise tracking and capture of prey through certain perception and stress. For example, a simple coelenterate can use olfactory and tactile cells (which have not yet formed complex organs such as eyes) to locate a small shrimp floating by, and wriggle its body to suck it in.

This overall stress behavior across multiple cells requires nerve conduction. Although these creatures do not have brains, they do have ganglia and form simple neural networks. The principle of this network is as follows:

输入
输入
控制
导致
输入
环境刺激
传感细胞
神经网络
运动细胞
环境变化

It is assumed that the tactile cells of this coelenterate are located around the mouthparts and can sense the concentration of shrimp molecules in 16 directions; when the concentration of shrimp reaches the threshold, the length of 16 groups of muscle fibers in the body is controlled, so that the mouthparts can quickly track and touch the shrimp. The parameters felt by the sensor are composed of self-motion and environmental stimuli. This simple feedback system allows the tentacles to continuously approach the prey.

QCDW Coelenterate

This is a typical mechanical automatic control problem. Using a hydraulic system and an ultrasonic sensor, this control structure can be reproduced more realistically. The code to control the hydraulic system does not have to use a neural network, but can use traditional control theory for calculus mathematics. What needs to be explained is that science has not yet fully conclusively concluded whether there is a "memory" in this structure. But from common sense and general biology teachers, coelenterates have no memory. With the same jellyfish, there is no way to learn from failure. For example, a jellyfish's tentacles were cut off by a sharp glass, but the other tentacles would not avoid the glass.

It should also be noted here that the squid is not a coelenterate, but a mollusc. It has a brain or a distributed multi-core.

3 Decoupling of perception and action

If you think carefully about how the above two activities differ from those of higher organisms, the most obvious difference is the strong correlation between perception and action . For lower creatures:

  • The data (or neural signals) obtained by perceiving the environment directly determines the action. This decision is immediate, without (weak) memory, intuitive and direct.
  • This kind of stress is instantaneous and cannot adapt to the long-term law of things changing over time.

When organisms continue to evolve and have ganglia or primitive brains dedicated to complex neural activities, the coupling between perception and action is reduced. This complexity led to an important advance, the generation of "experience". This capability allows the use of past accumulated experience to guide subsequent activities.

输入
输入
控制
导致
输入
读取
写入
环境刺激
传感器
计算单元
运动系统
环境变化
存储单元

A typical thought experiment. If you prepare two kinds of fish food with different appearance and color, one is delicious to fish, and the other is not good to eat. After repeated random feeding, the fish will choose the delicious color or shape in advance.

不严谨的说,此刻鱼形成了一种概念,比如方形红色的鱼食很够劲,圆形绿色的不好吃。为了记忆这个知识,鱼在神经系统的记忆部分会记录形状、色彩这些概念,并与好吃、不好吃形成上下文联系。

方形
好吃
圆形
难吃

有了这种记忆和关联能力,更大的脑容量、更复杂的概念和高维度的关联,就会形成极为有效的经验。比如如果科学家按照白天、晚上投喂时间的不同(环境光照不同),来决定好吃和形状的搭配,则聪明鱼学习到了如下经验:

在晚上
方形
好吃
圆形
难吃
在白天

从上述思维过程来说,人思维活动中简单的经验和鱼没有什么区别。比如经过训练,我看到某种包装的瓜子,就知道它是假货还是真货。与鱼相比,人只是观察的角度更多,但本质没有区别。

4 概念的连接与推理

想象一只猎豹在逆风处准备燃烧宝贵的能量发起对羚羊的冲刺,它仔细的观察、聆听、嗅闻,评估捕捉哪一只成功率最高。注意,下面的图是站在会说话的人类的角度来绘制的。猎豹对于味道、毛色的概念,与会说话的人类完全不同。极有可能这种概念是视觉记忆区域的回放,或者其他简化的指代。

A候选个头小
有酸味
腹部呼吸急促
不健康的幼崽
B候选花纹暗淡
腿受伤\没有角
毛稀疏
受伤衰老的雌性
距离比较
捕猎A

如果猎豹是一个妈妈,可能会引入更多的决策概念

A候选个头小
有酸味
腹部呼吸急促
不健康的幼崽
B候选花纹暗淡
腿受伤\没有角
毛稀疏
受伤衰老的雌性
决策
我有4个宝宝
需要超过我半身的猎物
捕猎B

猎豹的大脑中,应该已经具备了很多经验与概念。比如最基础的对物体的大小、味道、材质的概念,对猎物呼吸的评估(时间的概念)、对孩子食量大小的估计。思维的过程就是对这些概念的先后顺序、联系进行调整,套用经验得到最有可能获利的决策。

其实对于人来说,大部分时间都在进行类似的思维决策。

下班晚
孩子饿
今天热
买凉皮他最喜欢吃

我们即使不开口说话,思维活动的绝大多数是由一个个概念串起来的。如果一个物种不会说话,它也会形成概念。但各个物种利用概念的能力、方式可能是不同的。

5. 单词是概念的索引

猎豹建立的概念可能是一个具象化的浮影。现在试图假设自己完全不会说话,去思考一个事情。比如组成一盘黄瓜炒蛋的要素。你会发现,自己的面前浮出和一个黄瓜的影子,和圆不溜秋的鸡蛋,可能还有油盐酱醋等味道。这种概念是从大脑视觉、味觉等感知区域调取的传感器的历史记录。调取这样的记录,像是从机械硬盘里检索一份冗长的数据,非常耗时。

如果一个生物不会为各种概念建立索引,那它的思维过程就非常痛苦。想象我们在编程的时候,对100个数据块进行处理,显然更希望通过一个ID来标识每个数据,而不是用数据的全貌来区分。高等生物对一些复杂概念形成的索引,就是单词。

有条纹的
斑马
四个蹄子
鬃毛
闻起来很臭

概念与概念之间的联系和组合关系是多层、交织的。他们既有包含关系,又有跨层次的联系。大胆的猜测,不会说话的生物也有类似单词这种用于指代具象化事物的“ID”。这种“ID”用于个体大脑内部的思考,但现有的科学手段无法感知研究。

这种“ID”,或者说单词,应该是使用少量神经节的状态就能表达的一种物理化学结构,而不是需要支持回放的视觉、味觉这种复杂的向量。这样对于思考的效率、能量比如葡萄糖的消耗都是有利的。

6 语言是概念的接口

对于社会性动物,跨个体的信息交流是一种巨大的优势。这种信息交流能够跨越个体传递经验。对于传递概念,有两种进化路线。

一种是传递指代压缩符号。对使用低频声波的物种,比如大部分会叫的动物,还有人类,受限于交流的时间成本,倾向于用简单的符号指代传递信息。以狐猴或者小型的啮齿动物为例,即使它们不会像人类一样说话,也能用长短高低不同的叫声,来互相提醒老鹰等掠食者的到来。但三长两短这种叫声为什么代表老鹰,恐怕大有可能要有同伴的扑街来示范,也就是现场教育警示,把三长两短和老鹰来袭关联起来。这是因为狐猴的语言系统不能表达足够多的概念,以便在洞里就把老鹰这茬子事情说清楚。

另一种是直接传递数据本身。对掌握高频声波的物种,比如海豚、蝙蝠,极有可能除了上述抽象的语言,还可以直接回放脉冲多普勒神经系统的浮影来进行具象化交流。也就是没有经过压缩编码的原汁原味的传感器信号。这样在家里的孩子就能学到妈妈捕猎时的录像(大有可能和B超的图像类似),兴许还能结合第一种语言,大致描述一下沙丁鱼和剑鱼的区别。

抽象波形接口 具象波形接口
insert image description here insert image description here
insert image description here insert image description here

对上述两种交流方式,我觉得第二种非常酷。但是,第一种具备一个非常大的优势——可以描述自然界不存在的东西。

  • 冷热、明暗,苦辣、色彩形状,这些基于感觉器官收集的原始概念组合成的概念簇,是直接的、具象化的概念。无论组合多复杂,都可以细细拆解为各个感觉器官的直观感受。
  • 懒惰、勤劳、善良、灵活、明天、昨天,这种对事物规律的总结、对决定事物运动的深层次原因的描述,是无法使用具体的感觉神经系统来回放的。

复杂的神经系统能够根据浅表的具象化的概念,生成非常深层次的抽象概念,使得个体能够站在历史的、时间的、辩证的角度思考问题。同时,通过语言系统对这些概念进行不厌其烦的描述,进而跨越个体和时间(隔代)继承知识。

人类语言的复杂性使得我们通过简单的单词、句子的组合、修饰(一些语言比如粘着语略有不同),在长达几个小时甚至几天的时间内,按部就班叙述并教授一个复杂的概念。同时能够设置场景,在“假如”的前提下,把剑齿虎来袭这种事情说清楚。这种能力使得人类不用现场示范,就能教授很多知识。

想象一下我们是如何从小时候只知道吃辣条,到现在会用微积分和FFT设计工业产品的。我们经历的教育,就是在不断建立概念、关联概念。这种建立、关联过程如此痛苦,很多人直到30岁还在学校读书。

7 自然语言模型高度接近人类思维

从本文上面的探讨,就能感觉到,自然语言模型强调对上下文的关联、推理,这一点与人类思维的原理是非常接近的。虽然计算机不知道“冷”是啥意思,但是我们思考的时候,也不会去仔细品味“冷”的感觉,而是一笔带过。

“天冷了,穿衣服”

大部分人在上述六个字的思考、交流过程中,不会刻意的思考自己在冷库里瑟瑟发抖这种“冷”的具体意义,而是直接套用了这个经验。换句话说,我们完全可以在计算机里建立“冷”的概念模型.

皮肤细胞感受到温差
温度体现微观粒子运动的激烈程度
低温环境
微观粒子相对平静
恒温动物
体温不变

自然语言模型不需要能够感受到冷,也能通过学习人类的语言著作,把这些概念建立起来。

GPTGPT的训练是以生成正确的文本为目标的,显然与人类思维的目标不同。但人类思维的目标,比如吃得饱、穿得暖,玩得嗨,本身也可以用语言来描述,从而变成GPT训练的目标。从这个意义上说,语言是思维的符号化表示,只要能够在语言层面得到正确的结果,那这个思维黑箱和大脑就没有本质的不同。

8 AI with the ability to think and act

AI with the ability to think and act may use a natural language model as a thinking engine, use a perception model such as CNN as a sensor engine, and use a dedicated model for automatic control to guide the movement module. Judging from the current level of technology, it is not far away to create bionic artificial intelligence, or artificial consciousness.

In order to avoid potential security issues, the main thing is to figure out how to pre-train the model. Is the newborn's brain a blank? Obviously not. After hundreds of millions of years of evolution, our DNA has solidified some basic and essential things into the biological structure diagram. It is very important to understand the structure and weighting of this part formed by natural selection. This structure determines many pre-wired and solidified properties in the brain, such as the guarantee of basic survival and the solidification of basic concepts, such as pain, pain, and sweetness.

As an AI, the pre-training process corresponds to the realization of this part. If you want to realize a biochemical human, some concepts in the pre-training part are especially important.

  • Inert: Tendency to sleep when there is no stressful input.
  • Pain: The feature that needs to be avoided should be set as pain, which is the penalty in the algorithm.
  • Goals: The features that need to be designed, set as goals, such as a robot that takes care of patients, and the reward is for patients to be happy.

At the same time, the pre-training model also covers the rapid training of the acquired learning stage, such as the association of basic concepts and the establishment of basic logic. For robots equipped with sensors, concepts such as red and edges can be connected with specific CNN slices and thresholds to form real perception. When the bionic person starts working (such as entering a nursing home), he can continuously optimize his service strategy through the intervention of sensors, motion systems and doctors.


  1. The Road to AGI: Essentials of Large Language Model (LLM) Techniques ↩︎

おすすめ

転載: blog.csdn.net/goldenhawking/article/details/129964018