AI started: the basics of machine learning

This chapter points:

  • AI associated with the human brain;
  • An input / output model;
  • Classification and regression;
  • sequentially;
  • training.

A layman that artificial intelligence is artificial brain, and it always and science fiction movies robots linked, but in fact these science fiction and now nothing to do with artificial intelligence. Artificial intelligence is indeed very similar to the human brain, but they are significant difference is that artificial intelligence is artificial - artificial intelligence need not have biological properties.

Before further in-depth study, but also to introduce some general concepts of artificial intelligence algorithms to interact with. Artificial intelligence algorithms is also known as the "model", in essence, it is a technique used to solve the problem. There are already a lot of different characteristics of artificial intelligence algorithms, the most commonly used neural networks, support vector machines, Bayesian networks and hidden Markov models, this series will examine each address those models.

For AI practitioners, how to model the problem in the form of artificial intelligence program that can be handled it is critical, because this is the main way to interact with artificial intelligence algorithms. Next we will be the human brain interacts with the real world for the lead, launched to explain the basics of this chapter.

1.1 associated with the human brain

AI's goal is to make the computer work, such as the human brain, but that does not mean that AI needs to emulate the human brain in all aspects. The degree of matching an artificial intelligence algorithms and the real function of the human brain called the "biological plausibility."

Allen Institute for Brain Science (Allen Institute of Brain Science), chief scientific officer Christof Koch has asserted brain "is the most complex thing in the universe known." [1] . And in the disciplines of artificial intelligence, the brain is essentially a kind of esoteric, complicated technology, we need to study it, it works to resolve and mechanisms through reverse engineering, to mimic its function.

Of course, the brain is not the only kind of "advanced technology" we learn from nature, flying is one of them. Early aircraft trying to imitate the birds flutter of wings, however, this is called "flapping wing" aircraft effect is unsatisfactory. FIG. 1-1 is a "Gray Goose ornithopter" patent FIG.

 

Figure 1-1 Gray Goose ornithopter (U.S. Patent No. 1,730,758)

In the early 20th century, living birds are the only reference model aircraft. It seems very reasonable, after all, they are experts in flying thing. However, from the point of view of aircraft test flight, humans should not try to copy the full nature of the solution. While we want to imitate just "fly" the end result, but the copy of birds flying attitude can not produce a practical aircraft.

"Simulation" abstraction is present in many contexts, like my MacBook Pro can emulate either a Windows PC, you can also emulate a Commodore 64. The C64's old is not only reflected in the appearance, and it's now a lot of driving instruction set computer generic Intel x86 instruction set is quite different, so Mac and not a simulation to simulate the machine when C64 C64 6510 Microprocessor an actual transistor structures, but in the simulation a higher level of abstraction. Artificial Intelligence with this same token, there is part of the algorithm simulates neurons, but there are some algorithms are the same as the C64 emulator, to simulate a more abstract level - we only care about providing the corresponding function in the PC environment, the ultimate goal without having to simulate the brain to produce the full functionality of the process.

Direction is more important than fashion. Most of the human brain and artificial intelligence algorithms have many similarities in highly abstract level, this section will give this evidence.

1.1.1 brain and the real world

Before starting the discussion, we need to look at the working mechanism of the brain from an external perspective. After all, did not we know little about the internal mechanism of the brain, our external working mechanism of the brain is actually quite some understanding.

Is a black box connected by a nerve on the nature of the brain, these nerve responsible for passing signals between the brain and body. A specific set of input signal will produce a specific output, such as when you feel your fingers will touch the hot stove, other neurological will issue instructions to your muscles to recover fingers.

Also a focus should be noted that there is an internal state of the brain. Think about when you suddenly hear a horn, your reaction depends not only on the sound of the horn stimulation, but also on where and when you hear the sound of the horn - heard a horn in the middle of a movie, and hear when you walk through the bustling streets heard the horn can cause different reactions. Your environment will be set to a particular internal state your brain so that the brain response to different situations different.

Order to accept the stimulus is also very important. There is a common game is close your eyes and try to identify objects by touch only. You can not get enough information the first time to seize the object to determine what it is, and the need to constantly move the fingers to feel its shape, to get enough information to outline this picture of an object, judge its category .

You may as well put the human brain as a black box with a series of input / output. Our nerves provide us with all the knowledge of the world, which itself is the brain of the input signal, and for normal brain is a limited number of inputs.

Similarly, we interact with the real world is the only channel that output signals from nerves to muscles. The human brain is actually output is a function of the input signal and a brain internal state corresponding to any input signal, the human brain will adjust its internal state while producing an output signal. And sequentially input signals affect large or small, all the time depending on the internal state of the brain.

1.1.2 brain in a vat

If we enter the real world and the only interaction channels is obtained from the motor nerve receptors and by acting output, then the "real" what is? Your brain may be also be like the movie "The Matrix" in the scene, like with your body coupling device is coupled with a simulation. If the output of the brain can produce the desired feedback input, and then you should be how to distinguish between the real and the unreal?

It is more than a well-known philosophical thought experiment: "brain in a vat." Figure 1-2 graphically illustrates this thought experiment. The figure of the brain that his body is walking the dog, but in fact the brain is really the body do? The dog even exist? "Existence" of the word itself, and what does it mean? Everything we know are nothing but signal transmission from the nervous system [2] .

 

Figure 1-2 brain in a vat

This thought experiment hypothetical person's brain can be separated from the body rely on life-support systems remain active. Brain nerve connected to a supercomputer can be fully simulated electric pulse signal actually received the brain, after supercomputer will produce an appropriate manner in response to such an output signal to the brain to simulate the real world. As a result, the ex vivo brain will still remain on the outside "real world" completely normal cognitive experience. Indeed there is even philosophical theory that we live in a simulated world [3] .

An attempt to direct modeling algorithm on the human brain is "neural network." A neural network is a small branch of artificial intelligence research, and it amazingly with many algorithms in this series you will learn the same.

Computer-based neural network is different from the human brain, after all, because they do not have the versatility. Existing neural networks can only solve a particular problem, a limited range of applications. AI algorithm, the algorithm produces an output signal and the internal state based on the received input current, and in order to reality perception. Therefore, the algorithm so-called "reality" often change as researchers conducted experiments.

Whether you are in the program or the preparation of artificial intelligence for a robot to a shareholders, input data, output data model and the internal state consisting of artificial intelligence algorithms applicable to most - say "most" because of course there are also more complex algorithms.

1.2 pairs modeling problem

The means to acquire the real world modeled as machine learning algorithms is essential. There are different algorithms for different problems, the most abstract level, you can be modeled as your question in one of four ways:

  • Data Classification;
  • regression analysis;
  • Clustering problem;
  • Timing problems.

Of course, sometimes we need a comprehensive variety of ways to model the problem. Then start data classification, to explain one by one from the above method.

1.2.1 brain and the real world

Classification trying to input data classified as a class, they are usually supervised learning, providing the expected output data and machine learning algorithms by the user. Data classification problems, the expected result is the data category.

Supervised learning process are known data, during training, the performance of machine learning algorithms to assess the effect on the classification of known data. Ideally algorithm after training, but also to be properly categorized unknown data.

Iris contains measurement data Fisher iris data set [4] is a classification sample. This is one of the most famous sets of data, often used to evaluate the performance of machine learning algorithms. Complete data set is available at the following URL:

http://www.heatonresearch.com/wiki/Iris_Data_Set[5]

The following is a small sample of the data set.

 
  1. “Sepal Length”,“Sepal Width”,“Petal Length”,“Petal Width”,” Species
  2. 5.1,3.5,1.4,0.2,“setosa”
  3. 4.9,3.0,1.4,0.2,“setosa”
  4. 4.7,3.2,1.3,0.2,“setosa”
  5. ...
  6. 7.0,3.2,4.7,1.4,“versicolor”
  7. 6.4,3.2,4.5,1.5,“versicolor”
  8. 6.9,3.1,4.9,1.5,“versicolor”
  9. ...
  10. 6.3,3.3,6.0,2.5,“virginica”
  11. 5.8,2.7,5.1,1.9,“virginica”
  12. 7.1,3.0,5.9,2.1,“virginica”

The above data shows a comma separated value file (Comma Separated Values, CSV) format, which is a very common in machine learning input data format. As shown in the sample, the first line of the file defines the content of each column is typically data. In the sample, each flower has five dimensions of information:

  • Calyx length;
  • Calyx width;
  • Petal length;
  • Petal width;
  • species.

Classification problems, the algorithm needs given the sepals and petals long, wide, to determine the species of flowers, this species is the flower belongs to "class" [6] .

"Class" generally is a non-numerical data attributes, class members must therefore have a very well defined. For example, the iris data set only three different species of iris, then use this data set training machine learning algorithms can not expect it to be able to identify the roses. In addition, all of the elements in the training class must be known data.

1.2.2 Regression Analysis

In section 1.2.1, we learned how to classify the data. In general, however, the desired output is generally not a simple category data, but numerical data, for example, to calculate the fuel efficiency of an automobile, then after a given engine size and body weight, should be calculated specific vehicle fuel efficiency.

Here are five kinds of fuel-efficient models:

 
  1. “mpg”,“cylinders”,“displacement”,“horsepower”,“weight”,”
  2. acceleration”,“modelyear”,“origin”,“carname”
  3. 18.0,8,307.0,130.0,3504.,12.0,70,1,“chevroletchevellemalibu”
  4. 15.0,8,350.0,165.0,3693.,11.5,70,1,“buickskylark320”
  5. 18.0,8,318.0,150.0,3436.,11.0,70,1,“plymouthsatellite”
  6. 16.0,8,304.0,150.0,3433.,12.0,70,1,“amcrebelsst”
  7. 17.0,8,302.0,140.0,3449.,10.5,70,1,“fordtorino”
  8. ...

Complete data set can be obtained at the following URL:

http://www.heatonresearch.com/wiki/MPG_Data_Set[7]

Regression analysis is designed to input data using auto-related training algorithm, so that it can get specific output based on input calculations. In this example, the algorithm needs to be given specific models most likely fuel efficiency.

Also note, not every data file are useful, such as "car" and "origin" on two data useless. "Models" is excluded because it simply did not matter with the fuel efficiency, while the "origin" too little to do with fuel efficiency in addition to valid data - although the "origin" this one is given a numerical value of the car production area, and in some areas more emphasis on fuel efficiency of this parameter, but this data is somewhat too broad, and therefore abandoned.

1.2.3 clustering problem

Clustering problem with classification issues like, in these two types of problems, the computer will have to enter the data grouping. Before the start of training, programmers usually pre-specified number of clusters of clusters, based on the input data to the computer similar items put together. Due to the given input does not specify a certain part of a cluster, so when there is no target output data, clustering algorithm is extremely useful. But also because of the expected output is not specified, the clustering algorithm is unsupervised learning.

Think of section 1.2.2 in automotive-related data, you can use the car clustering algorithm divided into four groups, each group will be some features similar models.

Clustering and classification except that the clustering algorithm to a greater degree of freedom allowed to find their own data from the law; the classification data is required to specify the type known to the algorithm, so that it finally able to correctly identify the new data had not been used for training.

Very different way of clustering and classification algorithms to process new data. The ultimate goal of the classification algorithm is trained according to the preamble data can recognize the new data is correct; and clustering algorithm is no such statement, "new data", in order to add new data to an existing group, it is necessary to re-divide the entire data set.

1.2.4 Timing issues

Machine learning algorithm works somewhat like a mathematical function of the input value is mapped to a specific output value. If the machine learning algorithm does not exist "internal state", then a given set of input data will always produce the same output. However, many machine learning algorithms do not exist can change or affect the output of the so-called "internal state." Instance, the car data, you'll want to calculate results of classification algorithms to fit all the data, not just the last few cars fitting the data it receives.

Generally the timing is very important, although part of the machine learning algorithm supports timing, but also part does not support this feature. If only the car or iris classification, the idea does not have to be too concerned about the timing; but if the only input is the current stock price, then order it plays an important role, because one day a stock price of a single predict price movements not help but pull for a long time interval, the overall number of days of stock price movements might get much useful.

There are ways to convert time-series data on the timing of the algorithm does not support, so you want a few days ago as part of the input data. For example, you can use five to represent the input data to predict the first five trading days of the day.

1.3 pairs of input / output model

Mentioned earlier in this chapter, the machine learning algorithm is actually given input produces an output, while the output but also by the impact of long-term memory algorithm itself. Figure 1-3 shows how the short and long term memory process involved in producing output.

 

Figure 1-3 illustrates the abstract machine learning algorithm

As shown in FIG 1-3, the algorithm to accept input and generates output. Input and output of most machine learning algorithms are fully synchronized, only the given input, the output will produce, rather than the human brain can not only respond to the output, occasionally able to produce its own output in the absence of input .

So far, we've been talking about abstract input / output mode, you must be very curious input / output in the end look like children. In fact, the input and output are in the form of vectors, and the vector is essentially a floating-point array shown below:

 
  1. Input:[-0.245,.283,0.0]
  2. Output:[0.782,0.543]

The number of input and output vast majority of machine learning algorithm is fixed, as a function of the same computer program. The input data may be regarded as function parameters, and the output is a function return value. On the last embodiment, the algorithm takes three input values, output values ​​of two return, and these generally do not have any number of variations, which also result in terms of a specific algorithm, the number of elements of the input and output modes are not It will change.

To use this method, you must enter a specific problem into an array of floating-point numbers, and similarly, solution of the problem will be floating-point array. Seriously, this is the most algorithms can do the limit, machine learning algorithms to put it bluntly, is an array into another array Bale.

In traditional programming practice, many pattern recognition algorithms used to map a bit like key-value pairs in the hash table, and the hash table has some similarities to a large extent with the dictionary, because they are an entry corresponding to a meaning . Hash tables are typically long this child the following:

  • “hear” ->“to perceive or apprehend by the ear”;
  • “run” ->“to go faster than a walk”;
  • “write” ->“to form (as characters or symbols) on a surface with an instrument (as a pen)”。

This embodiment of the hash table are words to a defined mapping, wherein the mapping is a key string is a string of the same value. You are given a key (word), hash table will return a value (corresponding to the definition of the word), which is the most machine learning algorithm works.

在所有程序中,哈希表都由键值对组成,机器学习算法输入层的输入模式可以类比为哈希表中的“键”,而输出层的返回模式也可以类比为哈希表中的“值”——唯一的不同在于机器学习算法比一个简单的哈希表更为复杂。

还有一个问题是,如果我们给上面这个哈希表传入一个不在映射中的键会怎么样呢?比如说传入一个名为“wrote”的键。其结果是哈希表会返回一个空值,或者会尝试指出找不到指定的键。而机器学习算法则不同,算法并不会返回空值,而是会返回最接近的匹配项或者匹配的概率。比如你要是给上面这个算法传入一个“wrote”,很可能就会得到你想要的“write”的值。

机器学习算法不仅会找最接近的匹配项,还会微调输出以适应缺失值。当然,上面这个例子中没有足够的数据给算法来调整输出,毕竟其中只有3个实例。在数据有限的情况下,“最接近的匹配项”没有什么实际意义。

上面这个映射关系也给我们提出了另一个关键问题:对于给定的接受一个浮点数组返回另一个浮点数组的算法来说,如何传入一个字符串形式的值呢?下面介绍一种方法,虽然这种方法更适合处理数值型数据,但也不失为一种解决办法。

词袋算法[8]是一种编码字符串的常见方法。在这个算法模型中,每个输入值都代表一个特定单词出现的次数,整个输入向量就由这些值构成。以下面这个字符串为例:

 
  1. Of Mice and Men
  2. Three Blind Mice
  3. Blind Man's Bluff
  4. Mice and More Mice

由上例我们可以得到下面这些不重复的单词,这就是我们的一个“字典”:

 
  1. Input 0 :and
  2. Input 1 :blind
  3. Input 2 :bluff
  4. Input 3 :man's
  5. Input 4 :men
  6. Input 5 :mice
  7. Input 6 :more
  8. Input 7 :of
  9. Input 8 :three

因此,例子中的4行字符串可以被编码如下:

 
  1. Of Mice and Men [0 4 5 7]
  2. Three Blind Mice [1 5 8]
  3. Blind Man's Bluff [1 2 3]
  4. Mice and More Mice [0 5 6]

我们还必须用0来填充字符串中不存在的单词,最终结果会是下面这样:

 
  1. Of Mice and Men [1,0,0,0,1,1,0,1,0]
  2. Three Blind Mice [0,1,0,0,0,1,0,0,1]
  3. Blind Man's Bluff [0,1,1,1,0,0,0,0,0]
  4. Mice and More Mice [1,0,0,0,0,2,1,0,0]

请注意,因为我们的“字典”中总共有9个单词,所以我们得到的是长度为9的定长向量。向量中每一个元素的值都代表着字典中对应单词出现的次数,而这些元素在向量中的编号则对应着字典中有效单词的索引。构成每个字符串的单词集合都仅仅是字典的一个子集,这就导致向量中大多数值是0。

如上例所示,机器学习程序最大的特征之一是会把问题建模为定长浮点数组。下面的小节会用几个例子来演示如何进行这种建模。

1.3.1 一个简单的例子

你要是读过机器学习相关的资料,就一定见过“XOR”(即逻辑异或,eXclusive OR)这个运算符,模仿异或操作的人工智能程序堪称人工智能界的“Hello World”。本书确实有比XOR运算符复杂得多的内容,但XOR运算符依然是最佳入门案例。我们也将从XOR运算符上手。首先将其视作一个哈希表——如果你对XOR运算符不太熟悉的话,可以类比一下AND和OR运算符,它们工作原理十分相似,都是接受二元输入从而产生一个布尔值的输出。AND运算符当二元输入都为真时,输出则为真;而OR运算符只要二元输入中有一个为真,输出就为真。

对于XOR运算符而言,只有当其二元输入互异时,输出才为真。XOR运算符的真值表如下:

 
  1. False XOR False = False
  2. True XOR False = True
  3. False XOR True = True
  4. True XOR True = False

将上面的真值表用哈希表形式表示的话,会是下面这样:

 
  1. [0.0,0.0] -> [0.0]
  2. [1.0,0.0] -> [1.0]
  3. [0.0,1.0] -> [1.0]
  4. [1.0,1.0] -> [0.0]

以上映射展现了这个算法中输入和理想的预期输出间的关系。

1.3.2 燃油效率

机器学习问题通常需要处理一组数据,通过计算来对输出进行预测,或者对一系列行为进行决策。以一个包含以下字段的汽车数据库为例:

  • 汽车重量;
  • 发动机排量;
  • 气缸数;
  • 功率;
  • 混合动力或常规动力;
  • 燃油效率。

假如你已经收集到了以上字段对应的数据,那么你就能够建立模型并基于其余属性对应的值对某个属性的值进行预测了。举个例子,让我们来预测一下汽车的燃油效率。

首先我们要把问题归化为一个映射到输出浮点数组的输入浮点数组,并且每个数组元素的取值范围应该在0~1或-1~1,这一步操作称为“归一化”。归一化将在第2章中详细介绍。

首先我们来看看如何归一化上例数据。考虑一下输入、输出数据格式,我们总共有6个字段属性,并且要用其中5个来预测第6个属性,所以算法要有5个输入和1个输出。

算法的输入和输出大概是像下面这样:

  • 输入1:汽车重量;
  • 输入2:发动机排量;
  • 输入3:气缸数;
  • 输入4:千瓦;
  • 输入5:混合动力或常规动力;
  • 输出1:燃油效率。

我们需要对数据进行归一化,首先要为每个值选定一个合理的区间,然后再在保持相对大小不变的情况下将这些值转换为(0, 1)区间中的值。下面这个例子就为这些值选定了一个合理的区间:

  • 汽车重量:45~2 268千克;
  • 发动机排量:0.1~10升;
  • 气缸数:2~12个;
  • 功率:1~736千瓦;
  • 混合动力或常规动力:“真”或“假”;
  • 燃油效率:0.42~210千米/升。

这些范围对如今的汽车而言取得有些大了,不过却保证了未来不需要怎么重构就可以继续使用这个算法。大范围也有大范围的优点,那就是不至于产生太多极端数据。

现在来看一个例子,怎么样归一化一个900千克的重量数据呢?重量的取值区间大小是2 200千克,在区间中这个重量的相对大小是900千克,占取值区间的百分比是0.40(900 / 2 200),因此我们会给算法的输入传入一个0.40的值来代表900千克的重量。这也满足了常见的输入为(0, 1)区间的范围要求。

“混合动力或常规动力”的值是真或假的布尔值,只要用1代表混合动力,用0代表常规动力,就轻易完成了布尔值到1或0两个值的归一化。

1.3.3 向算法传入图像

图像是算法的常见输入源。本节我们将介绍一种归一化图像的方法,这种方法虽然不太高级,但效果很不错却。

以一个300像素×300像素的全彩图像为例,90 000个像素点乘以3个RGB色彩通道数,总共有270 000个像素。要是我们把每个像素都作为输入,就会有270 000个输入——这对大多数算法来说都太多了。

因此,我们需要一个降采样的过程。图1-4是一幅全分辨率图像。

 

图1-4 一幅全分辨率图像

我们要把它降采样为32像素×32像素的图像,如图1-5所示。

 

图1-5 降采样后的图像

在图片被压缩为32像素×32像素之后,其网格状模式使得我们可以按像素来生成算法的输入。如果算法只能分辨每个像素点的亮度的话,那么只需要1 024个输入就够了——只能分辨亮度意味着算法只能“看见”黑色和白色。

要是希望算法能够辨识色彩,还需要向算法提供每个像素点的红绿蓝3色(RGB)光强的值,这就意味着每个像素点有3个输入,一下子把输入数据的数目提升到了3 072个。

通常RGB值的范围在0~255,要为算法创建输入数据,就要先把光强除以255来得到一个“光强百分数”,比如光强度10经过计算就会变成0.039(10/255)。

你可能还想知道输出的处理办法。在这个例子中,输出应该表明算法认为图片内容是什么。通常的解决方案是为需要算法识别的每种图片创建一个输出通道,训练好的算法会在置信的图片种类对应的输出通道返回一个值1.0。

在1.3.4节中,我们将以金融算法为例,继续讲解针对实际问题格式化算法的方法。

1.3.4 金融算法

金融预测是一种时间算法的常见应用。所谓“时间算法”,指的是接受时变性输入值的一种算法。要是算法支持短期记忆(即内部状态)的话,也就意味着自动支持输入范围的时变性。要是算法不具有内部状态,那就需要分别使用一个输入窗口和一个预测窗口——而大多数算法又没有内部状态。下面以预测股市的算法为例来讲解如何使用这两个数据窗口。假设你有了某只股票如下数天的收盘价:

 
  1. Day 1 : $45
  2. Day 2 : $47
  3. Day 3 : $48
  4. Day 4 : $40
  5. Day 5 : $41
  6. Day 6 : $43
  7. Day 7 : $45
  8. Day 8 : $57
  9. Day 9 : $50
  10. Day 1 0 : $41

第一步要将数据归一化。无论你的算法有没有内部状态,这一步都是不可或缺的。要将数据归一化,我们把每个数据都转换为对前一天的同比百分比变化,比如第2天的数据就会变成0.04,因为45美元到47美元之间变化了4%。在对每一天的数据都进行相同操作之后,数据集会变成下面这样:

 
  1. Day 2 : 0. 04
  2. Day 3 : 0. 02
  3. Day 4 : -0.16
  4. Day 5 : 0. 02
  5. Day 6 : 0. 04
  6. Day 7 : 0. 04
  7. Day 8 : 0. 04
  8. Day 9 : -0.12
  9. Day 10 : -0.18

要创建一个预测后一天股票价格的算法,需要考虑一下怎么样把数据编码为算法可接受的输入形式。而这个编码方式又取决于算法是否具有内部状态,因为具有内部状态的算法只需要最近几天的输入数据就可以对走势进行预测。

而问题在于很多机器学习算法都没有“内部状态”这一说,在这种情况下,一般使用滑动窗口算法对数据进行编码。要达到这个目的,需要使用前几天的股票价格来预测下一天的股票价格,所以我们假定输入是前3天的收盘价,输出是第4天的股价。于是就可以对上面的数据如下划分,得到训练数据,这些数据实例都指定了对于给定输入的理想输出。

 
  1. [0.04,0.02,-0.16]->0.02
  2. [0.02,-0.16,0.02]->0.04
  3. [-0.16,0.02,0.04]->0.04
  4. [0.02,0.04,0.04]->0.26
  5. [0.04,0.04,0.26]->-0.12
  6. [0.04,0.26,-0.12]->-0.18

上面这种编码方式要求算法有3个输入通道和1个输出通道。

1.4 理解训练过程

训练的本质是什么?训练是一个算法拟合训练数据的过程——这又不同于前面提到过的“内部状态”了,你可以把“训练”认为是对长期记忆的塑造过程。对神经网络而言,训练改变的就是权重矩阵。

何时训练由算法决定。一般来说,算法的训练和实际应用是鲜明分立的两个阶段,但也确实有训练和应用并行不悖的时候。

1.4.1 评估成果

在学校里,学生们在学习科目的时候会被打分,这种打分出于很多目的,其中最基本的目的是对他们的学习过程提供反馈。同样,你也必须在算法训练阶段评估你的算法性能,这种评估既对训练有着指引意义,又能提供一种对训练成果的反馈。

一种评估方法是用一个评分函数,这个评分函数会使用训练好的算法并对它进行评估。这个评分函数只会返回一个分数,而我们的目标则是使这个分数达到上限或下限——对任何给定的问题来说,取上限还是下限都完全无所谓,仅仅取决于评分函数的设置。

1.4.2 批量学习和在线学习

批量学习(batch training)和在线学习(online training)跟学习过程的类型有关,因此通常在处理训练数据集的时候发挥作用。所谓“在线学习”,就是每输入训练集中的一个元素就进行一次学习;而批量学习则是一次性对特定数量的训练集元素进行学习,并同步更新算法。批量学习中指定的元素数量称为“批量大小”,并且这个“大小”通常与整个训练集的大小相当。

在线学习在必须同步进行学习和训练的情况下很有用,人脑就是以这种模式在工作,但在人工智能领域就不那么常见了,并且也不是所有算法都支持在线学习。不过在神经网络中在线学习倒是比较普遍。

1.4.3 监督学习和非监督学习

本章浮光掠影式地了解了两种截然不同的训练方法:监督学习(surprised training)和非监督学习(unsurprised training)。当给定算法的预期输出时,就是监督学习;没有给定预期输出的情况就是非监督学习。

此外还有一种混合训练方法,只需要提供部分预期输出即可,常被用于深度置信网络。

1.4.4 随机学习和确定学习

确定学习(deterministic training)算法只要给定相同的初始状态,就总会以完全相同的方式运行,在整个算法中一般都没有用到随机数。

而随机学习(stochastic training)则不同,需要用到随机数。因此,即使选用同样的初始状态,算法也会得到全然不同的训练结果,这就使得评估随机算法的性能变得比较困难。但是不得不说,随机算法应用广泛并且效果拔群。

1.5 本章小结

本章介绍了人工智能领域,尤其是机器学习领域的一些基础知识。在本章中,你可以学会如何将问题建模为机器学习算法——机器学习算法与生物过程颇有一些相似之处,但人工智能的目标并非完全模拟人脑的工作机制,而是要超越简单的流程化作业程序,制造出具有一定智能的机器。

机器学习算法和人脑的相似之处在于都有输入、输出和不显于外的内部状态,其中输入和内部状态决定了输出。内部状态可以视作影响输出的短期记忆。还有一种被称作“长期记忆”的属性,明确指定了给定输入和内部状态之后,机器学习算法的输出。训练就是一个通过调整长期记忆来使算法获得预期输出的过程。

机器学习算法通常被分为两个大类:回归算法和分类算法。回归算法根据给定的一至多个输入,返回一个数值输出,本质上是一个多输入的多元函数,其输出可能为单值,也可能是多值。

分类算法接受一至多个输入,返回一个类别实例,由算法基于输入进行决策。比如,可以用分类算法将求职者分为优先组、备选组和否决组。

本章说明了机器学习算法的输入是一个数值型向量。要想用算法处理问题,明白如何用数值向量的形式表达问题至关重要。

第2章将进一步介绍“归一化”的概念。归一化泛指通过预处理将数据转化为算法的输入形式的各种方法。此外,归一化也用于解释机器学习算法的输出。


本文摘自:《人工智能算法 卷1 基础算法》,[美] 杰弗瑞·希顿(Jeffery Heaton) 著,李尔超 译。

  • AI算法入门教程书籍,人人都能读懂的人工智能书
  • 全彩印刷,实例讲解易于理解的人工智能基础算法
  • 多种语言版本示例代码、丰富的在线资源,方便动手实战与拓展学习

算法是人工智能技术的核心。本书介绍了人工智能的基础算法,全书共10 章,涉及维度法、距离度量算法、K 均值聚类算法、误差计算、爬山算法、模拟退火算法、Nelder-Mead 算法和线性回归算法等。书中所有算法均配以具体的数值计算来进行讲解,读者可以自行尝试。每章都配有程序示例,GitHub 上有多种语言版本的示例代码可供下载。本书适合作为人工智能入门读者以及对人工智能算法感兴趣的读者阅读参考。

发布了458 篇原创文章 · 获赞 273 · 访问量 82万+

Guess you like

Origin blog.csdn.net/epubit17/article/details/103948924