Use Streamsets achieve TensorFlow of dichotomous

Original link: https://streamsets.com/blog/binary-classification-of-streaming-data-using-tensorflow-to-adls-gen1-and-adls-gen2/

Author: Rupal Shah   2019 Nian 5 Yue 2 Ri  /  StreamSets News

In the past decade, the digital transformation has been developed, and each system device has digital clues: IT from the server to the plant, to consumer electronics, to building, to the car. The amount of data, and increase the rate of species has increased complexity, not to mention the new data set must be real-time analysis. Suitable for use in data storage and application platform allows advanced analysis of the raw data to infinity. Analysis may be performed between the limbic system, or cloud provider data center. Flow computing platform may process the data in real time. Given the time requirements and respond to the exponential growth of data rates, we must consider the use of reliable methods to provide near real-time analysis and forecasts, projections and / or classification.

In order to provide analysis of the data set in response to time critical, StreamSets provides functionality to create a pipe, these pipes can ingest or dimension data set, and generates a prediction or classification contained in the environment. All without having to start as a Web service and open the ML model or HTTP REST API calls. For example, StreamSets pipeline can now detect fraudulent transactions or the text natural language processing, because the data through various stages before it is stored to the final destination - for further processing or decision.

Consider the breast cancer tumor is classified as malignant or benign use cases. (Wisconsin) Breast cancer is a classical data set, as scikit-learn the part.

Note: For information about how to use this data set and export TensorFlow training model and use it for a detailed explanation StreamSets data-flow pipeline, see this blog post .

Use TensorFlow  SavedModelBuilder after training and export models, use it in StreamSets data-flow pipeline to predict or classify very simple. When the preview (or executed) duct, breast duct through the recording input stage comprising TensorFlow model including:

The final output is sent to the recording Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2 * (shown above). Including breast output feature model for classification, the corresponding status field of cancer TF_Model_Classification user-defined model output value of 0 or 1, and a field conditions created by the expression evaluator is benign or malignant .

For more information about the data preparation phase of this pipeline, check out this detailed blog post .

The following is a screenshot of the pipe, the pipe continuously read patient data and real-time classification of breast tumors benign or malignant :

 The original text with video presentations, you can click on to view the original text.

Guess you like

Origin blog.csdn.net/zwahut/article/details/90668144