ML.NET Research Series 1 Beginners

Recent research team in machine learning, machine learning hope to achieve through a patch release assessment, anomaly detection system. Business scenario summarize:

  1. Collected data (release-related exception log, alarm data), identifies the patch is released situation (successful, failed)
  2. Select a machine learning Model Train conduct training
  3. (Higher accuracy) were the latest patch release prediction model trained based

A typical machine learning - learning supervised the scene. As a loyal user of .Net, the recent hot ML.NET sure to try an application. Today article as an entry, for everyone to share.

First pull an outline of it:

1. ML.Net Model Builder describes installation deployment

2. A typical example scenario

A, ML.Net Model Builder describes installation deployment

   First, what ML.Net Model Builder that? What does it do?

   https://marketplace.visualstudio.com/items?itemName=MLNET.07

   Model Builder is a simple UI tools for developers to build in their applications, and publish custom training machine learning models.

   No developer ML expertise can use this simple visual interface to connect to data stored in a file or in SQL Server, training models and generate code for model training and consumption.

   One sentence to sum up: machine learning, modeling tools, visualization by a VS Designer to build a machine learning model. And a guide while generating sample code can be reused.

   1. Install deployment

   The official recommendation is: Visual Studio 2017 15.9.12 or later

   I had installed the VS2019 and VS2017 Enterprise Edition directly through https://marketplace.visualstudio.com/items?itemName=MLNET.07  off the assembly line VS extensions MLNET_Model_Builder.vsix. Double-click the installation:

   VSIXInstaller.NoApplicableSKUsException: This extension is not installable on any currently installed products.

   Currently installed VS can not install this extension, meal Google, https://github.com/dotnet/machinelearning-samples/issues/451    not still solved. Reinstalled VS2017 and VS2019 natural and eggs.

   Finally, find the official recommendation of VS: Visual Studio 2017 15.9.12 or later   VS2017 installed in vs_community__425161747.1541050689 

   

   Finally successful installation.

   2. Create a new .Net Core Console Project, add Machine Learning Project

   

    Pop ML.Net Model Builder designer, description can begin modeling of machine learning.

   3. Start the machine learning model

   Microsoft is a typical scenario of machine learning and modeling of the abstract classification, mainly in the following three ways: 

   regression: regression class machine learning models: Typical scenarios include: price prediction, sales forecasting, etc.

   binary classification: binary classification machine learning models, there is a typical scenario: User comments sentiment analysis (negative or positive), transaction risk prediction (or is No)

   multi-classification: multi-dimensional classification machine learning models. Typical scenarios include: user portraits, data classification

   In addition, ML.Net also supports custom modeling.

   

 4. Sample data preparation machine learning and training needs of Train

   Example The sample data and scenarios provided by Microsoft, currently training machine learning sample data is structured data, to determine the dimensions value. At the same time, the need for dimensional data to predict the conduct Label identification and labeling.

   Summary overview about:

  1.    样本数据必须是结构化的数据,确定的列和值
  2.    样本数据由各个维度列和一个预测维度列组成
  3.    样本数据中预测维度列的值需要手工标注,以便进行机器学习训练

   从上面的总结可以看出,ML.NET 属于监督学习这一类。

   样本数据的格式:支持CSV(逗号间隔)、TSV(Tab间隔)和SQL Server。

   至于怎么另存为TSV文件,其实很简单,Copy示例数据到文本编辑器,另存为**.tsv文件即可。https://raw.githubusercontent.com/dotnet/machinelearning/master/test/data/wikipedia-detox-250-line-data.tsv

   选择输入结构化的样本数据后,要指定一个机器学习要预测的列。

5. Train训练、评估

   指定输入的数据和要预测的列,进行训练。训练的过程会评估AutoML中提供的各种算法的准确度。

   Train训练的时间,随数据量的不同而不同

   训练完成后,会输出一个最佳准确度的算法,同时生产一个模型文件,MLModel.zip, 供后续预测使用。

6. 生成可重复执行的代码

   即将ML.NET Model Builder 设计器向导的配置,生成可重复执行的代码:两个C# Project,一个Model的Project,一个Console的Project。

二、典型场景示例

  第一大章节,我们将整个ML.NET的建模过程做了梳理,现在我们以微软的示例代码,做一个实践应用。

  这次我们选择用户反馈情感分析这个场景,这几天我想了一下,这个场景的实际价值是:线上爬取指定产品的用户评论和反馈,通过机器学习预测出产品的热度、问题,后续进行产品完善和市场活动。

  话不多说,开始吧。

  1. 准备TSV数据

   这个非常简单:https://raw.githubusercontent.com/dotnet/machinelearning/master/test/data/wikipedia-detox-250-line-data.tsv,这个文本拷贝到Sublime Text中,另存为data.tsv文件

  2. 新建.Net Core Console 应用,右键添加 Machine Learning项目

    在选择场景步骤中,我们选择第一个,“情感分析”

  

  2. 选择样本数据,进行训练,预测

  选择第一步我们准备好的data.tsv文件,指定一个要预测的列Sentiment

  

  3. 开始样本数据的训练

   训练的时间和数据量有关系,一般的:

  

   


   这里我们尝试了10s和30s,最近算法和准确度没有变化,只是尝试机器学习训练的算法要多:

  

  5. 生成可重复执行的代码工程

   

 

  生成代码后,会在当前解决方案中多了两个Project,一个是Model的Project,一个Console的Project,我们深入看一下

  

  其中Model Project中主要包含:

  模型的输入类和输出类,其中:

  •   输入类ModelInput是对我们输入数据的结构化描述
  •   输出类ModelOutput是包含预测列和评估准确度

  还有一个机器学习样本数据训练完成后的MLModel.zip文件,供后续数据预测用。

  Console Project中,主要形成了一个可重复执行的代码:重点看Main函数的代码:

  

 1  //Machine Learning model to load and use for predictions
 2         private const string MODEL_FILEPATH = @"MLModel.zip";
 3 
 4         //Dataset to use for predictions 
 5         private const string DATA_FILEPATH = @"C:\Users\zhougq\Desktop\Data.tsv";
 6 
 7         static void Main(string[] args)
 8         {
 9             MLContext mlContext = new MLContext();
10 
11             // Training code used by ML.NET CLI and AutoML to generate the model
12             //ModelBuilder.CreateModel();
13 
14             ITransformer mlModel = mlContext.Model.Load(GetAbsolutePath(MODEL_FILEPATH), out DataViewSchema inputSchema);
15             var predEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(mlModel);
16 
17             // Create sample data to do a single prediction with it 
18             ModelInput sampleData = CreateSingleDataSample(mlContext, DATA_FILEPATH);
19 
20             // Try a single prediction
21             ModelOutput predictionResult = predEngine.Predict(sampleData);
22 
23             Console.WriteLine($"Single Prediction --> Actual value: {sampleData.Sentiment} | Predicted value: {predictionResult.Prediction}");
24 
25             Console.WriteLine("=============== End of process, hit any key to finish ===============");
26             Console.ReadKey();
27         }

   上面的代码解读一下:

  •     构建一个MLContext
  •     MLContext上加载训练好的模型(MLModel.zip)
  •     输入要预测的数据
  •     预测,输出结果(ModelOutput)

  上面的代码是一个点睛之笔,我们可以想象一下:

  1. 每天正常的机器学习、训练,优化模型

  2. 线上数据,通过Kafka、文本等数据源,实时接入数据,进行预测

  3. 对预测的结果进行评估、对样本数据再纠正和标注,直至模型的准确率更高

  4. 作用与线上业务决策

  5. Loop

  是不是很赞,很简单,很容易理解,简化了我们对机器学习的建模、算法选择和评估。生产力工具,技术普惠。

  给ML.NET 点赞。

  后续我们将基于ML.NET实现更多的业务场景,逐步分享给大家。

 

周国庆

2019/6/23

Guess you like

Origin www.cnblogs.com/tianqing/p/11071864.html