With you immediately understand the automatic machine learning! (AutoML)

Personal blog navigation page (click on the right link to open a personal blog): Daniel take you on technology stack 

Why automatic machine learning

For new users of machine learning is concerned, a major obstacle to using a machine learning algorithm is the performance of the algorithm by many decisions affect the design. With the prevalence of deep learning, engineers need to select the appropriate neural network architecture, the training process, regularization method, ultra-parameters, etc., all of which have a great impact on the performance of the algorithm. So deep learning engineers dubbed the assistant engineer.

Automatic Target Machine Learning (AutoML) is to use an automated data-driven way to make the above decisions. Users simply provide data, automatic machine learning system automatically determines the best solution. Experts in the field no longer need to fret various machine learning algorithms to learn.

Automatic machine learning algorithm to select not only include well-known, ultra-parameter optimization, and neural network architecture search, machine learning also covers each step of the workflow:

  • Automatic data preparation
  • Automatic feature selection
  • Automatic selection algorithm
  • Ultra parameter optimization
  • Automatic assembly line / workflow construction
  • Search neural network architecture
  • Automatic model selection and integration of learning

Ultra-parameter optimization Hyper-parameter Optimization

Learner models generally have two types of parameters, a class can be learned from the data obtained estimates, there is a class of parameters can not be estimated from the data, only the design specified by the experience of people who become hyper parameters. For example, support vector machines inside C, Kernal, game; alpha Naive Bayes inside the like.

There are many ways to optimize hyper-parameters:

The most common type is the black box optimization (black-box function optimization). The so-called black box optimization, decision-making network is treated as a black box to optimize, concerned only with input and output, and ignore its internal mechanisms. Decision network can usually be parameterized, this time we optimize the first thing to consider is convergence.

The following types of method are of black-boxes:

  • Grid search (Search Grid)
    Grid Search Everyone should be familiar with, it is a way to traverse a given combination of parameters to optimize the performance of the model through. Grid search of the problem is very prone curse of dimensionality, the advantage is very easy to parallel.
  • Random search (random search)
    random search function is a method of approximating optimal solutions determined using a random number request and minima.

    In many cases, better than random search grid to search results, but we can see from the chart, they are not guaranteed to find the optimal solution.
  • Bayesian optimization
    Bayesian optimization algorithm is an iterative optimization, comprising two main elements, the input data is assumed model and a collection function is used to determine which of a point to be evaluated next. Each iteration, use all of the observed data fit the model, and then use the model to predict the probability distribution function is activated, and decide how to use parameter points, or trade-off is Explaoration Exploitation. Compared to other black box optimization computational algorithm, activation function is much less, which is why the Bayesian optimization is considered to be a better parameter tuning algorithm over.

Some black box optimization tools:

  • hyperopt
    hyperopt  is a Python library that can be used to find the real, best value discrete values, dimensions and other conditions of the search space.
  • Google Vizier
    inside the Google machine learning system Google Vizier transfer learning to take advantage of technologies such as automatic optimization of other machine learning systems hyperparameter
  • Advisor
    Google Vizier open source implementation.
  • katib 
    optimization tools based on hyper-parameters Kubernetes

Since the mathematical optimization goal has discontinuous properties, can not be turned, etc., and so some non-gradient optimization search algorithm is used to solve the problem, including the black box we mentioned above algorithms. Such an algorithm to search through sampling and evaluation of sampling often requires a lot of evaluation of samples in order to obtain better results. However, the evaluation by the k-fold cross validation often obtained in an automatic machine learning task, the machine learning task in large data sets, a huge time cost evaluation. This also affected the effect of optimization algorithms on an automated machine learning problems. So some methods to reduce the cost evaluation is proposed, wherein the multi fidelity optimization ( Multi-Fidelity Methods ) is one of them. The techniques include: based on the learning curve to decide whether to terminate the training in advance, to explore - to exploit the (exploration exploitation) of the multi-armed bandit algorithm  (Multi-armed bandit) and so on.

There are also some studies are based on gradient descent optimization.

Ultra-parameter optimization face many challenges:

  • For large or complex models of machine learning lines, the need to assess the scale of the space is very large
  • Very complex configuration space
  • Can not or hardly changes with a gradient of the loss function
  • The training set is too small
  • It is easy to over-fitting

Related reference

Metalearning Meta Learning

Meta-learning is 'learning how to learn', through systematic observations of the performance differences between the existing learning task, and then learn the existing experience and metadata for better implementation of the new learning task. Doing so great that the static line or machine learning neural network architecture design, you can also use data-driven way to replace hand-workshop-like algorithm engineering work.

In a sense, meta-learning covers the ultra-parameter optimization, because learning metadata includes: ultra-parameter form, neural network architecture, model structure, characteristics yuan pipeline and so on.

Our machine learning algorithms, also known as 'learner', the learner is to assume a model that has a lot of unknown parameters, using the training data and parameter optimization algorithms to find the most suitable for the training data to generate a new algorithm, or the model parameters are known and unknown to predict the new data using the model / algorithm. If there is only one model, then the problem is simple, the problem is there are many models, different models have different hyper-parameters, we also tend to model and algorithm assembled together to form a composite model assembly line and machine learning, this when I need to know to solve different problems who want to build a different model. Metalearning At this time, we can put hyper-parameters, pipeline, a neural network architecture which are seen as a new unknown parameters of the model, the performance of different learning tasks as input data, so that we can use to optimize performance algorithms to find the best set of parameters. This mode can always be nested, that is, you can have 'Yuan Yuan meta-learning', of course, I hope you do not go too far, can not find the way back.

Meta-learning method includes:

  • Learn by model assessment
  • By property features yuan task to learn
    Here are some common characteristics yuan
  • Learning from existing models, including:
    • Transfer learning
    • RNN modify their use of weights in the learning process

If that is a big challenge to learn a complex model with little training data, which is the yuan learning one-shot issue or few-shot of.

Like human learning, like learning every time regardless of success or failure, we gain some experience, very few human beings learn from scratch. In the construction of automatic learning, we should also make full use of existing every learning experience, and gradually improved, making the new learning more effective.

Related reference:

Neural network architecture Neural Architecture Search Search

Lift AutoML, in fact, most people are because Google's system AutoML know the story. With the popularity of the depth of learning, neural network architectures become more complex, more and more handmade works will follow. Search neural network architecture is to solve this problem.

NAS consists of three main parts:

  • Search space search space
  • Search strategy search strategy
  • Performance estimation strategy performance estimation strategy

Related reference

Engineering automation features

Engineering automation features can help data scientists can automatically create a feature based on the best data set used for training.

Featuretools is an open source library to automate the feature works. It is an excellent tool designed to speed up the process of generating features, so that we have more time to focus on other aspects of the build machine learning model. In other words, it makes your data is in the "wait for the machine learning" state.

Featuretools package of three main components:

  • Entity (Entities)

  • Comprehensive feature depth (Deep Feature Synthesis, DFS)

  • Wherein motif (Feature primitives)

  • An Entity can be regarded as a representation Pandas data frame, a set of a plurality of entities called Entityset.

  • Comprehensive feature depth (DFS) has nothing to do with the depth of learning, do not worry. In fact, DFS is a feature of engineering methods is Featuretools backbone. It supports the construction of new features from a single or multiple data box.

  • DFS characterized by primitive entity relationships applied Entityset construction of new features. These features are common motif manual method of generating feature. For example, the primitive "mean" to find the average value of the variable at the aggregation level.

Attached Java / C / C ++ / machine learning / Algorithms and Data Structures / front-end / Android / Python / programmer reading / single books books Daquan:

(Click on the right to open there in the dry personal blog): Technical dry Flowering
===== >> ① [Java Daniel take you on the road to advanced] << ====
===== >> ② [+ acm algorithm data structure Daniel take you on the road to advanced] << ===
===== >> ③ [database Daniel take you on the road to advanced] << == ===
===== >> ④ [Daniel Web front-end to take you on the road to advanced] << ====
===== >> ⑤ [machine learning python and Daniel take you entry to the Advanced Road] << ====
===== >> ⑥ [architect Daniel take you on the road to advanced] << =====
===== >> ⑦ [C ++ Daniel advanced to take you on the road] << ====
===== >> ⑧ [ios Daniel take you on the road to advanced] << ====
=====> > ⑨ [Web security Daniel take you on the road to advanced] << =====
===== >> ⑩ [Linux operating system and Daniel take you on the road to advanced] << = ====

There is no unearned fruits, hope you young friends, friends want to learn techniques, overcoming all obstacles in the way of the road determined to tie into technology, understand the book, and then knock on the code, understand the principle, and go practice, will It will bring you life, your job, your future a dream.

Published 47 original articles · won praise 0 · Views 290

Guess you like

Origin blog.csdn.net/weixin_41663412/article/details/104848339