Python Machine Learning and Practice 1: Introduction

1 . Machine Learning Overview:

Machine learning systems solve problems that cannot be done directly using fixed rules or process code.

A program capable of " learning" refers to a program that is able to continuously learn from experience and data in order to cope with future forecasting tasks. We habitually refer to this ability to predict the unknown as generalization.

The book focuses on 2 fundamental aspects: supervised learning and unsupervised learning .

============

Supervised learning includes: regression and classification .

Regression means: it is a prediction problem, but the target of prediction is often a continuous variable. For example, the sales price is predicted based on the area, geographical location, construction age, etc. of the house, and the sales price is a continuous variable. For example, linear regression in statistics: use a linear function to describe some discrete data as much as possible, and let them return to your fitting function as much as possible. With the fitting function, you can predict some specific data. value.

Classification : It is to predict the category in which it belongs. For example, letting the neural network recognize pictures of cats and dogs is to randomly assign an unfamiliar picture to the trained neural network to recognize and classify.

============

Unsupervised learning includes: data dimensionality reduction and clustering , etc.

Data dimensionality reduction : It is to compress and filter the characteristics of things. This task is relatively abstract.

For example, in the task of recognizing faces in an image, we can directly read the pixel information of the image. If these pixel information is used directly, the dimension of the data will be very high, especially in today's increasingly high image resolution. Therefore, we usually use data dimensionality reduction techniques to reduce the dimensionality of the image and retain the most discriminative pixel combination.

Clustering: It depends on the similarity of data to divide similar data samples into a cluster. The literal meaning can also be seen to be the grouping of similar or similar ones together. It's the best summary of things like clustering.

=============

in conclusion

Supervised learning : Generally, the data samples we train have some informative features that reflect the inherent laws of the data. We highly generalize and condense these features and use a feature vector to represent the sample . For example, each of us has an ID card, and each ID card has a unique ID. ID can be understood as our feature vector, which can uniquely represent our own ID card.

In addition to using the feature vector to represent the current sample, there is also a label label , which represents the name of the category mentioned in the previous classification. For example, in the previous cat and dog, there are many cat photos, then these cat photos are represented by label=1, and the rest of the dog photos are represented by label=0, which means that the defined label represents the same category.

Unsupervised learning: Naturally, there is no label , so it cannot be used for prediction tasks, but it is more suitable for the analysis of data structure. However, the labeling of supervised data often consumes a lot of time, money and manpower, so the amount of data is relatively small.


2. Python programming:

Python features:

An interpreted language for easy debugging : Python is an interpreted programming language, similar to Java, the Python interpreter processes the code line by line. This facilitates the debugging process and is especially suitable for incremental development with different machine learning models.

Cross-platform : The source code of Python mentioned above will be interpreted into unique bytecode before being executed. From another perspective, Python can perform cross-platform jobs as long as a platform (win, Mac, Linux) has a virtual machine installed to run these bytecodes. Similar to the Java Virtual Machine.

Extensive programming interface : In addition to those third-party libraries used by programmers for self-development , many well-known companies in the industry have cloud platforms for scientific research and business, such as Amazon's AWS (Amazon Web Services) , Google Prediction API, etc. At the same time, it also provides a Python application programming interface for machine learning functions. The machine learning function modules of many platforms do not need to be written by users. Users only need to connect each module in series through the Python language and in accordance with the writing protocols and rules of the API , just like building blocks . That's it.

Rich and complete open source toolkit : Usually, we don't program from scratch. For example, vector calculations are often involved in learning algorithms; if Python does not directly provide tools for vector calculations, do we still need to spend time writing such basic functions? the answer is negative. Python's own free and open source features have enabled a large number of professional and even genius programmers to participate in the construction of Python's third-party open source toolkit (program library). Even more gratifying is that most toolkits (program libraries) are free for personal use and even commercial use. This includes several third-party libraries for machine learning that are mainly used in this book, such as NumPy and SciPy for vector, matrix, and complex scientific calculations; Matplotlib for MATLAB-style plotting; Scikit for a large number of classic machine learning models One learn; Pandas for quick analysis and processing of data; and Anaconda, a comprehensive practice platform that integrates all the above-mentioned third-party libraries.

SciPy requires NumPy support to install and run. Readers interested in these two programming libraries can refer to the following online tutorial to learn their usage in detail http://www.numpy.org/

      http://www. scipy. org/

Readers interested in more details can consult Matplotlib 's online documentation http://matplotlib.org contents.html

Pandas toolkit, see http://pandas.pydata.org/pandasdocs/stable/ for specific documentation

http://matplotlib. org/

http://scikit-learn. org/

http://pandas. pydata. org/

anaconda toolkit: https://www.continuum.io/documentation


3. Basic Python programming

There are six built-in common data types in python : Number, Boolean, String, and more complex Tuple,

Lists and Dictionaries.

Number (Number): Commonly used number types include integer number (Integer), long integer number (Long), floating point number (Float) and complex number (Complex). For example, readers can simply understand that they are commonly used integers, such as 10, 100, -100, etc. are all integers; the decimals generally used for calculation, such as -0, 1, 10, 01, etc. are all 0 can be stored using Python's floating point number type. Long and complex data types (imaginary numbers) are less commonly used, so I won't go into too much detail.

Boolean value (Boolean): The computing basis of the computer is binary, so any programming language will have this data type to represent true/false. In Python, these two values ​​have fixed representations: True for true and False for false. Keep in mind that Python is a case-sensitive programming language, so it is only in this way that an input will be interpreted as a boolean by the interpreter.

String (String): String is a data type composed of a series of characters (Character), which is widely used, especially for the processing of text data. In Python, strings can be represented using pairs of single or double quotes: 'abc', or "123". Although 123 seems to be an integer, once it is limited by a pair of single or double quotes, it becomes a string type of data

The above three categories are all basic built-in data types in Python. They are the basis for data representation and storage. Coming soon. This data structure is relatively complex, and requires the cooperation of the above three basic data types.

Tuple: A tuple is an ordered sequence of Python data types. It is represented by a set of parentheses () , such as (0, 'abc', 0, 4) is a tuple containing three elements. And readers will find that the data types in tuples do not have to be unified , which is a major feature of Python. Also, assuming that the tuple in the above example is called t, then t[0] has the value 1 and t[1] has the value 'abc' . That is to say, we can find the data we need directly from the tuple by index 0. In particular, it should be reminded that most programming languages ​​default the starting value of the index to 0, not 1.

List (List) : Lists and tuples are almost similar in function, but are represented slightly differently . Lists use a pair of square brackets [] to organize data , such as [1, 'abc" 0.4]. One exception to remember is: Python allows users to modify the data in the list while accessing the list, while the meta Not so with groups . List data types don't have to be uniform

Dictionary (Dicoonary) : This is a very practical and powerful data structure in Python, especially in data processing tasks, dictionary has almost become the mainstream form of data storage. From the data structure of the dictionary itself, it includes multiple sets of keys (key). Value (value) pairs, Python uses curly braces to accommodate these key-value pairs,

Like { 1: '1', 'abc': 0 1, 0, 4: 80 } . The reader should note that the keys in the dictionary are unique, but there is no data type requirement . Finding the value corresponding to a key is similar to accessing a tuple or a list. For example, if the dictionary in the above example is the variable d , then the value of d[ 1 ] is '1'; the value of d['abc] is 0·1.

To add: Python requires strict code indentation, generally 4 spaces are empty, no {} is used to represent code blocks, which are represented by indentation, corresponding to the tap key on the keyboard,

There are 3 types of annotations:

#xxx means comment a line of code

Commenting on multiple lines of code can use 2 methods ''xxx'' or 'xxx', and the above characters indicate these two methods.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325943035&siteId=291194637