Use TensorFlow real-time machine learning in the Streamsets

Original link: https://streamsets.com/blog/machine-learning-with-tensorflow-and-kafka-in-data-collector

Author: Dash Desai   /    2018 Nian 10 Yue 18 Ri   /   engineering , StreamSets News

       Only when business users and applications when access to the raw and aggregated data from a range of sources and generate timely data-driven insights, in order to realize the true value of modern DataOps platform. With machine learning (ML), analysts and data scientists can use TensorFlow technology, using historical data to help make better, data-driven business decisions - both offline and in real time.

       This article describes total, how TensorFlow model prediction and classification.

      Starting Data Collector version 3.7.0, TensorFlow Evaluator is no longer considered a technology preview feature and has been approved for production.

Before we understand the details, here are some basic concepts.

Machine Learning

Arthur Samuel describes it as: "field of study provides the ability to learn computer without explicit programming." With the recent advances in machine learning, computers are now able to predict, or even better than humans, it may be that it can solve any problem . Let us first look at what it solves the problem.

In general, ML divided into two categories:

Supervised learning

"Supervised learning is a function of machine learning tasks, which according to the example input - output of the input to the output map"  - Wikipedia.

It involves constructing a precise model that can be labeled predicted results for these results in the historical data.

Supervision learn to solve common business problems:

  • Binary classification (classification study predicted value)
    - whether the customer will buy a particular product?
    - this cancer is malignant or benign?
  • Multi-class classification (classification study predicted value)
    - the given text is toxic, threatening or obscene?
    - This iris genus, species or variegated Virginia?
  • Regression (predictive continuous learning value)
    - predicted the price of housing is how much?
    - Tomorrow the temperature in San Francisco is how much?

Unsupervised Learning

Allows us to know little or no output should be to solve the problem and under what circumstances. It involves building a model, wherein the tag data is not available in the past. In these types of problems, the structure is derived through the data based on the relationship between variables in the data cluster.

Two common methods of unsupervised learning is K-means clustering and DBSCAN .

Note : Data Collector and Data Collector Edge of TensorFlow assessment is currently only supports supervised learning model.

Neural networks and deep learning

ML is a form of neural network algorithm, to learn and to use computational model inspired by the human brain structure. Compared with other ML algorithms (such as decision trees, logistic regression, etc.), neural networks have proven to be highly accurate.

Deep learning is a subset of the neural network, which allows the network represent a variety of concepts in a nested hierarchy.

Andrew Ng in the context of traditional artificial neural network describes it. In the "deep learning, self-learning and unsupervised learning feature" in his speech entitled, he will learn the concept of depth is described as:

"Using brain simulation, hope:
- the learning algorithm is better, easier to use.
- revolutionary advances in machine learning and artificial intelligence.
I believe this is the best opportunity for us to be truly artificial intelligence."

Common neural networks and deep learning applications include:

  • Computer Vision / image recognition / object detection
  • Speech recognition / natural language processing (NLP)
  • Recommended system (product, matching, etc.)
  • Anomaly detection (network security)

TensorFlow

TensorFlow ML is an open source framework designed for deep neural network created by Google Brain Team. TensorFlow support scalable and portable training on Windows and Mac OS - the CPU, GPU and TPU in. As of today, it is the most popular and most active on GitHub ML project.

 

For more detailed information, please visit TensorFlow.org .

Data Collector in TensorFlow

By introducing TensorFlow assessment procedures, you can now create a pipe, you can extract data / functions contained in the environment and generate a prediction or classification, without the need for a Web service provider and ML models promoters disclosed or HTTP REST API calls. For example, Data Collector pipeline can now detect fraudulent transactions, or perform real-time text natural language processing through various stages before the data, and then stored to the final destination, for further processing or make decisions.

In addition, Data Collector Edge, you can Raspberry Pi running on devices such support  TensorFlow ML pipes, as well as the supported platforms other devices running on. For example, to detect high-risk areas floods and other natural disasters, to prevent damage to valuable assets.

Breast cancer classification

Let us consider a breast cancer tumor is classified as malignant or benign use cases. (Wisconsin) Breast cancer is a classical data set, as scikit-learn the part. To learn me how to use this training data set and export simple TF model in Python, please see my code on GitHub . As you will notice, model creation and training kept to a minimum, and only a few hidden layer is very simple. The most important aspect to note is how TensorFlow SavedModelBuilder  * Export and save the model.

* Note : To use TF model in the Data Collector or Data Collector Edge, you should use TensorFlow SavedModelBuilder in a supported language of your choice (such as Python and interactive environment (such as Jupiter Notebook)) export / save them.

After using TensorFlow SavedModelBuilder training and export models, as long as the model is saved in the location or Data Collector Data Collector Edge accessible, it can be used in the data-flow pipeline to predict or classify very simple.

Pipeline Overview

Before delving into the details, this is the pipeline look.

Pipeline details

  • Source directory :
    - This will load the .csv file records of breast cancer. (Note: this input data source can easily be replaced by other sources, including Kafka, AWS S3, MySQL, etc.)
  • Field Converter :
    - This processor converts breast cancer model using all of the input record ( mean_radius, mean_texture, mean_perimeter, mean_area, mean_smoothness, mean_compactness, mean_concavity, mean_concave_points, mean_symmetry, mean_fractal_dimension, radius_error, texture_error, perimeter_error, area_error, smoothness_error from String to Float's , compactness_error, concavity_error, concave_points_error, symmetry_error, fractal_dimension_error, worst_radius, worst_texture, worst_perimeter, worst_area, worst_smoothness, worst_compactness, worst_concavity, worst_concave_points, worst_symmetry, worst_fractal_dimension .
  • Evaluator * TensorFlow :
    - Save the model path: specify the location of the pre-training TF model to be used.
    - Model label: "offer" because the yuan chart (in our export model) is intended for service settings. For more detailed information, see tag_constants.py and related TensorFlow API documentation .
    - Input configuration: Specifies the input tensor configuration information during training and export models. (See the training model and use TensorFlow SavedModelBuilder part to save / export it .)
    - Output Configuration: Specify the output tensor configuration information during training and export models. (See the training model and use TensorFlow SavedModelBuilder save / export its Section.)
    - Output fields: we want to store the output record field classification value.
  • Expression evaluator :
    - This model output processor evaluates / classification value 0 or 1 (in the output field store TF_Model_Classification in ) and create a new record field 'Condition', which values were Benign or Malignant .
  • Stream selector :
    - the processor evaluates the cancer condition ( benign or malignant ) and routed to the appropriate record Kafka producers.
  • Producers Kafka :
    - model output and the input recording / classification value conditionally routed to Kafka two producers further processing and analysis. ( Note : These destinations can be easily replaced with other destinations, e.g. AWS S3, MySQL, NoSQL the like, and / or analyzed for further processing.)

 

* TensorFlow evaluator configuration

 

Note: Once TensorFlow Evaluator generate model outputs, pipeline stages in this example is optional, and may be interchanged with other processors and in accordance with certain requirements of use cases.

Pipeline execution

When the preview (or executed) Pipeline, breast records input through the data-flow pipeline stages outlined above, including our TensorFlow service model. The final output is sent to the recording Kafka producers include breast feature model for classification, the user-defined fields TF_Model_Classification the model output value of 0 or 1 , and the field conditions the corresponding cancer condition created benign or malignant .

 

Guess you like

Origin blog.csdn.net/zwahut/article/details/90668220