Basic introduction to machine learning concepts

This article will introduce the basic understanding and related prospects of machine learning, so as to achieve a basic understanding of the knowledge system related to machine learning. And understand several major fields related to machine learning: the similarities and differences of data mining and artificial intelligence.

Note: Learning from Lin Xuantian Machine Learning Cornerstone (Mandarin)

1.1 Purpose of machine learning

image-20220923191626944

image-20220923191817787

Process data (huge data) (use certain techniques) to improve certain aspects (improvement of performance, return on investment)

1.2 Application of machine learning

1.2.1 Understand how machine learning solves problems

image-20220923192404137

image-20220923192644213

If a human being recognizes a tree, it is trained through a large amount of data about the tree, even if it is unconscious training (normally, a three-year-old child can do it). And if you want to use a processing program to identify whether the image subject is a tree, (without using machine learning), it often requires hundreds of lines of code and a clear definition of the image subject’s characteristics, but the effect is not certain. As you wish. What's more, the world is not necessarily perfectly defined by programs and features. That is to say, sometimes it is impossible to think of how to use a program that has nothing to do with machine learning to complete certain processing procedures.

image-20220923193859860

There are many situations where robots go to Mars, and it is impossible to fully define every rule to cover all situations. Therefore, traditional methods may not be possible. It may be necessary for the machine to distinguish the situation through learning.

The identification of sound is also a big problem that has troubled the computer industry for a long time, because how to define sound, how to understand sound, and even understand language.

Some phenomena or short-term judgments that humans cannot predict, human beings themselves cannot complete and think, and cannot use rules to cover them at all.

Machine learning is like a fish. You teach it how to swim every day, and tell it how to swim, so that it can slowly do what it really wants to do.

1.2.2 Practical application of machine learning in daily life

  • Learn from videos on Twitter to discover the possibility of food poisoning (hygienic conditions) when customers go to a restaurant.

  • By investigating how customers who meet expectations like to match clothes, as well as scoring data on some outfit images, recommend customers how to match clothes to meet the public's aesthetic requirements.

  • Through the characteristic data of buildings that have completed the judgment of energy consumption, the energy consumption of buildings can be predicted, so as to achieve the effect of energy saving and carbon reduction.

  • Through the comprehensive data of traffic sign images on real roads and the meaning of expression, the accuracy of traffic sign recognition by driverless cars can be improved.

  • Through a student answering system to collect students' understanding and solution of the problem, so as to adjust the problem, so as to improve the efficiency and correct rate of students' answering questions.

image-20220923202932905

Machine learning may be used to solve the problems of the learning system. By providing 3,000 students with 9 million answering data to the machine learning program, let the machine learning program perform (reverse engineering) to judge how the students are learning in these data. And give the degree of difficulty of the problem.

image-20220923203524981

Based on the public's preference for a large number of movies (movie ratings), the system recommends some movies to the user (maybe movies that the user has not seen yet, but they are recommended because they contain some elements that he may like), and these Movies are tailored to the user's preferences.

image-20220923203945443

image-20220923204654092

image-20220923204310182

Basically, machine learning recommendation systems learn our preferences for certain things, and human preferences may be summarized as attractive features of things (objective function f), and the machine learning model that may be used is to combine human characteristics The scoring data is inner producted with the feature data (training data D) of the movie. The higher the result, the more likely it will be recommended. Using these collected recommendation data (hypothesis formula g), reverse the possible preference data of an individual (input data), so as to make individual recommendations (output results).

Add:

  1. The recommendation process can be said to be a process in which the relationship between the public and the characteristics of things transitions to the relationship between individuals and the characteristics of things
  2. The inner product of vectors (also called dot product)

1.3 When to use machine learning

image-20220923194726753

1.3.1 Certain objectives

In the process of a machine learning program, there must be a rationalized, definable goal or rule, so that the machine can approach the direction of solving a practical problem.

  • identify the problem

  • analyze risk

  • prediction problem

  • Optimization

    ······

1.3.2 Unable to describe and define rules and sufficient data scale

Machine learning often has unexpected miraculous effects for some problems that cannot be described and cannot be optimized but can determine the optimization goal. Compared with our rigid design of programs and definition of intricate processing rules, it is more convenient and efficient.

But it is a pity that machine learning to solve problems also requires a sufficient amount of data that may be involved in the problem to be covered as much as possible for relevant training. After all, machine learning is not a panacea.

1.3.3 Related exercises

image-20220923195203106

Which of the following questions can be better applied to machine learning?

  • Judging whether the baby will cry in the odd time period or the even time period in the future
  • Determine whether a circle exists in an abstract image
  • As a bank subject, judge whether to give some customers a credit card based on certain risks
  • Predict whether the use of nuclear energy will lead to the destruction of the earth in the future.

image-20220923195721697

Explanation: Suggestion Question 3 for Machine Learning Prediction

  1. There are goals, but there are no specific rules and conditions to determine the environment (time, place, person, etc.), if it is predicted that the above problems and certain environmental conditions will cause crying in odd or even time periods, machines can also be used Learning to train, data source baby life video)
  2. It is easy to use programs to solve this problem, without using machine learning methods
  3. Using other programs, you may need to face huge data and undefined data and related association rules to solve problems. If you use machine learning, there is a learning goal of whether to give a credit card or not, and the bank has stored credit card customers in the past ten years. Situational data, which serves well as a reasonably sufficient amount of data to solve this problem.
  4. There is not enough data to deduce whether the earth will be destroyed in the next ten years, for example, enough earth catastrophes caused by nuclear energy, and the causes of the earth catastrophe, or the specific manifestations of the earth catastrophe, and then explain the possible destruction of the earth.

image-20220923205058962

image-20220923205117110

Machine learning has been shining brightly in the fields of finance and economics (predicting the rise and fall of stocks, which is still developing, and the effect is not good), medicine (predicting drugs), law (generating document summaries), etc.

1.4 Symbolic representation of machine learning

1.4.1 Combined with the credit card case to understand the general symbolic expression of machine learning

image-20220923210207288

image-20220923210220518

If we want to solve the above problems, how do we start machine learning? The program is a logical world, and real problems need to be expressed abstractly, and this also involves some common symbolic expressions in the field of machine learning, combined with examples to illustrate.

symbol explain
X Input data (customer application data, that is, data that needs to be judged)
Y Output data (issuing card results, converting input results into final answers)
f The objective function represents X -> Y (objective function, rules that cannot be simply described, intermediate processes)
D Training data (credit card customer data in previous years)
g Hypothetical formula, used to measure the specific credentials of customers issuing cards, the more similar g and f are, the better

image-20220923210929648

image-20220923211218866

Note: To solve the problem of machine learning, a perfectly expressed objective function is required, that is, f. Since it is precisely f that cannot be perfectly expressed, we can use g to approach it. Through sufficient data and improvement of the calculation algorithm, g can be made as possible as possible. The closer to f.

Combined with the credit card case, it is possible that machine learning may be selected or combined in the following hypothetical formulas:

g ∈ H = h k g \in H = {h_k} gH=hk

h 1 : Personal annual income ≥ 180,000 (180,000) RMB h_1: Personal annual income\geq180,000 (180,000) RMBh1:personal annual income180000 ( one hundred and eighty thousand ) RMB

h 2 : Liabilities ≥ 200,000 (200,000) RMB h_2: Liabilities\geq 200,000 (200,000) RMBh2:debt200000 ( 200,000 ) RMB _

h 3 : working years ≤ 2 years h_3: working years\leq2 yearsh3:working years2 years

······

Finally, how the machine algorithm formulates specific hypotheses depends on the specific data and conditions. Here is just an explanation for easy understanding.

Machine learning starts from the data D and calculates a hypothesis formula g close to the objective function f, so as to predict the output data Y corresponding to the new input data X.

1.4.2 Related exercises

image-20220923213311588

image-20220923213441333

Judge X, Y, D, g, f respectively

image-20220923213608029

from s 3 s_3s3And the description of the title, it is easy to see that this is the objective function f, and this is a relationship between song recommendation and song factors, and referring to several other formulas, it can be seen that song factors are related to user ID, song ID and song evaluation . s 2 s_2s2The expression of is the possible user ID and song ID, it is not difficult to speculate that it may be input data or training data, and s 4 s_4s4Comparing the result set of millions of user IDs and song IDs with the result set of song evaluation, it is further verified, s 2 s_2s2For input data, not training data (a single data volume is not enough for training, find out the hypothesis formula), s 4 s_4s4is the training data, therefore, s 1 s_1s1For the song recommendation index, use the score range from 0 to 100 to score, which is the output result. In summary, draw conclusions.

1.5 Similarities and differences between machine learning and other fields

1.5.1 Machine Learning and Data Mining

image-20220924191530153

Machine learning and data mining are sometimes in the same relationship, and sometimes they are mutually complementary. They are very similar and inseparable. If the purpose of data mining is to find out the relationship between certain things and other interesting or useful aspects in specific application scenarios, machine learning may be an excellent tool for data mining. At this time, they are the same of. (The KDDCup competition is both a data mining competition and sometimes a machine learning competition.) If the purpose of data mining is to further explore the various characteristics of data, we can analyze it better. At this time, machine learning is The assistant of data mining is a powerful tool to complete data mining. Of course, it does not mean that data mining and machine learning are completely inseparable. Data mining sometimes focuses more on finding better methods of processing data and computing in huge amounts of data. At this time, machine learning may not be a good idea. direction. (At present, those who are proficient in data mining are often quite accomplished in the field of machine learning)

1.5.2 Machine Learning and Artificial Intelligence

image-20220924192626461

Artificial intelligence, in layman's terms, is the use of computers to complete certain intelligent (intelligent) behaviors, and these behaviors are not repetitive, and can be modified according to certain conditions. And machine learning is also an excellent method recognized at present to realize artificial intelligence. How did it come true? For example, for the behavior of playing chess, the possible approach is to let the machine continue to try to break the game under a large number of existing chess games, and design a better algorithm to let the machine continue to train and explore the result of winning or failing the game, so as to avoid The situation that leads to the failure of the chess game. Finally, the training results will be played against real chess masters to try to optimize the training results. (Of course, this is an understanding idea, and the actual training idea will be more complicated)

1.5.3 Related exercises

image-20220924193043793

Data mining is definitely not the same discipline as machine learning, and they differ in some directions.
Ask for likes and forward

Guess you like

Origin blog.csdn.net/yumuing/article/details/129274732