Section 1: Introduction to Machine Learning and scikit-learn

Table of contents

 0. Introduction

 1. Introduction to Supervised Learning

Types of Supervised Learning

2. Introduction to unsupervised learning

3. Introduction to scikit-learn


 0. Introduction

        Machine learning (English: Machine learning) is becoming more and more popular nowadays, and the threshold for entry into machine learning is also becoming lower and lower. Thanks to excellent machine learning frameworks and tools, beginners can also quickly get started with a machine learning project and use machine learning algorithms to mine their own data. In this course, we first introduce several basic concepts in machine learning.

knowledge points

  • Supervised Learning Concepts
  • Unsupervised Learning Concepts
  • scikit-learn tools

 1. Introduction to Supervised Learning

        In machine learning, we usually come into contact with different application types such as supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning . Among them, supervised learning is one of the most common and widely used branches.

        The goal of supervised learning is to learn a predictive model from known training data such that the model produces a predictive output for other input data. Among them, the "supervision" of supervised learning is an expression of relative and "unsupervised". The difference between the two is that the training data of supervised learning has been manually labeled, while unsupervised learning does not have this process.

        Like the two simple datasets above. The dataset on the left is clearly not labeled. The data set on the right is color-labeled, that is, the data samples are artificially labeled with orange, green and blue labels.

Types of Supervised Learning

        In supervised learning, the problems faced can be roughly divided into two categories: classification and regression .

        Among them, the classification problem can be simply summarized as: there are some data samples and clear sample classification. Now the rules are summarized from the characteristics of these samples, and then used to determine which category the new input sample belongs to. For example, the figure below shows a classification process that uses a supervised learning algorithm to classify fruits.

        Examples of classification can be seen everywhere in life. For example, as we are talking about here, supervised learning can be divided into classification and regression. In the face of a new problem, to judge whether it belongs to classification or regression, you need to judge according to the characteristics of the problem. 

Among them, the biggest difference (feature) between a regression problem and a classification problem is that the types of output variables are different . In detail: 

  • For classification problems, the output is a finite number of discrete variables , Boolean values ​​or categorical variables.
  • For regression problems, the output is a continuous variable , usually a real number, that is, an exact value.

        For example, when we predict the gender of a person, it is a typical classification problem. When predicting a person's age, it should be regarded as a regression problem. 

2. Introduction to unsupervised learning

        What is Unsupervised Learning? Generally speaking, it is a relative concept to supervised learning. In the process of supervised learning, we need to label the training data, which is an essential step. The data faced by unsupervised .

        For example, let's say we now have a bunch of pictures of animals. In supervised learning, we need to label in advance which animal each photo represents. This one is a dog, that one is a cat, and then train. Finally, the model can clearly distinguish the category of animals for the newly input photos.

        When doing unsupervised learning, the photos are not labeled. We need to "feed" all the training sample photos to the algorithm. Note that this time is somewhat different from supervised learning. Unsupervised learning can only identify several types of animals in the training samples, but cannot directly tell you whether this is a cat or which one is a dog. However, the number of categories here is generally not too large, and you can manually label the categories and use the data for other purposes.

        In the above example, unsupervised learning identified several categories of samples, which is what we usually call "clustering" . As shown in the figure below, an unsupervised clustering process is demonstrated.

 

        Of course, the clustering mentioned above is only the main problem faced by unsupervised learning. In fact, unsupervised learning also includes more application aspects such as principal component analysis. In machine learning, when the data we use does not have a specific label, it can basically be classified as an unsupervised learning problem.

3. Introduction to scikit-learn

       There are many commonly used methods for machine learning, such as: linear regression, support vector machine, k-nearest neighbor, decision tree, naive Bayesian, logistic regression , etc. Some of these methods have relatively complicated mathematical processes. If you need to implement these algorithms through code every time, the threshold of machine learning will become very high.

        scikit-learn  is a very popular open-source machine learning tool. It encapsulates many machine learning algorithms , and often requires only a few lines of code. Through simple method calls, the original complex machine learning model can be realized.

        In addition, scikit-learn also provides a set of tools around the core algorithm of machine learning, including data preprocessing, model evaluation, hyperparameter optimization , etc. This is very important and useful for doing machine learning modeling with Python.

        Therefore, in this course we will start with scikit-learn to understand and master the process and methods of machine learning modeling.

Guess you like

Origin blog.csdn.net/m0_69478345/article/details/130018554