iOS - 机器学习《三》

前言:

  承接上文,上次只是找了些理论资料与代码,最近有时间就写了一个demo,一片崩。。。

  简单说下这次的demo吧,还是想做一个自动识别判断影评的模型。

一、构建训练数据

  1、我本来准备的数据是这样的

[
    {
        "text":"这部电影真好看","label":"好评"
    },
    {
        "text":"太烂了","label":"差评"
    },
    {
        "text":"一般般,不算差也不算好","label":"中评"
    },

    但是这个数据不能直接用于训练,目前iOS的ML不支持中文,所以我把中文转成16进制,好评中评差评转成 字符串0,1,2来区分。

  2、代码

//1.读取JSON文件
    NSMutableArray *textList = [[NSMutableArray alloc] init];
    NSData *JSONData = [NSData dataWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"MLData" ofType:@"json"]];
    NSString *str  =[[NSString alloc] initWithData:JSONData encoding:NSUTF8StringEncoding];
    str = [self removeSpaceAndNewline:str];
//    NSLog(@"%@",str);
    
    NSError *error;
    NSArray *dataPathList = [NSJSONSerialization JSONObjectWithData:JSONData options:NSJSONReadingMutableContainers error:&error];
    if (dataPathList.count == 0){
        NSLog(@"JSON解析失败");
        return ;
    }
    
    //2.将JSON文件的汉字处理成64编码
    [dataPathList enumerateObjectsUsingBlock:^(NSDictionary *obj, NSUInteger idx, BOOL * _Nonnull stop) {
        NSString *moviceContent = obj[@"text"];//影评内容
        moviceContent = [self hexStringFromString:moviceContent];
        
        NSString *moviceType = obj[@"label"];//影评类型
        NSString *movieTypeNum = @"0";
        if ([moviceType isEqualToString:@"好评"]) {
            movieTypeNum = @"2";
        } else if ([moviceType isEqualToString:@"中评"]) {
            movieTypeNum = @"1";
        } else if ([moviceType isEqualToString:@"差评"]) {
            movieTypeNum = @"0";
        }
        NSDictionary *dict = @{@"text":moviceContent,@"label":movieTypeNum};
        [textList addObject:dict];
    }];
    
    //3.导出JSON文件
    NSData *whirtData =[NSJSONSerialization dataWithJSONObject:textList options:NSJSONWritingPrettyPrinted error:0];
    [whirtData writeToFile:@"/Users/sunjiaqi/Desktop/appsTrain.json" atomically:YES];
    NSLog(@"文件生成成功");
- (NSString *)removeSpaceAndNewline:(NSString *)str {
    NSString *temp = [str stringByReplacingOccurrencesOfString:@" " withString:@""];
    temp = [temp stringByReplacingOccurrencesOfString:@"        " withString:@""];
    temp = [temp stringByReplacingOccurrencesOfString:@"    " withString:@""];
    temp = [temp stringByReplacingOccurrencesOfString:@"    " withString:@""];
    return temp;
}

- (NSString *)hexStringFromString:(NSString *)string{
    NSData *myD = [string dataUsingEncoding:NSUTF8StringEncoding];
    Byte *bytes = (Byte *)[myD bytes];
    //下面是Byte 转换为16进制。
    NSString *hexStr=@"";
    for(int i=0;i<[myD length];i++) {
        NSString *newHexStr = [NSString stringWithFormat:@"%x",bytes[i]&0xff];///16进制数
        if([newHexStr length]==1)
            hexStr = [NSString stringWithFormat:@"%@0%@",hexStr,newHexStr];
        else
            hexStr = [NSString stringWithFormat:@"%@%@",hexStr,newHexStr];
    }
    return hexStr;
}

  注:这个里面有一个坑点,我这个JSON文件里面多了很多换行和空格,导致一直读取不出来里面的数据,折腾了很久。

处理好了之后的JSON文件长这样:

[
  {
    "label" : "2",
    "text" : "e8bf99e983a8e794b5e5bdb1e79c9fe5a5bde79c8b"
  },
  {
    "label" : "0",
    "text" : "e5a4aae78382e4ba86"
  },
  {
    "label" : "1",
    "text" : "e4b880e888ace888acefbc8ce4b88de7ae97e5b7aee4b99fe4b88de7ae97e5a5bd"
  },

  接下来就可以拿这份数据来训练生成模型了。

二、生成模型

打开playgroud,直接上代码运行

import Cocoa
import CreateMLUI
import CreateML

var str = "Hello, playground"

//let builder = MLImageClassifierBuilder()
//builder.showInLiveView()




//训练源地址
let data = try MLDataTable(contentsOf: URL(fileURLWithPath: "/Users/sunjiaqi/Desktop/appsTrain.json"))

//导入训练源数据
let sentimentClassifier = try MLTextClassifier(trainingData: data, textColumn:"text",
labelColumn: "label")

//评估模型准确度
//let evaluationMetrics = sentimentClassifier.evaluation(on: data, textColumn: "text", labelColumn: "label")
//let evaluationAccuracy = (1.0 - evaluationMetrics.classificationError) * 100
//print("evaluationAccuracy:\(evaluationAccuracy)")


//导出模型
let metadata = MLModelMetadata(author: "命无双",
                               shortDescription: "这是一个判断影评的模型",
                               version: "1.0")
try sentimentClassifier.write(to: URL(fileURLWithPath: "/Users/sunjiaqi/Desktop/导出的模型/SentimentClassifier.mlmodel"),
                              metadata: metadata)

  注:这就是最终的模型了,将它导入到demo里面就可以使用了。

三、使用模型

  1、导入模型,直接拖入就好

  2、构建模型工具类

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface SentimentClassifierModel : NSObject


+ (NSString *)judgeMoviceContentWith:(NSString *)content;

@end

NS_ASSUME_NONNULL_END
#import "SentimentClassifierModel.h"
#import "SentimentClassifier.h"

@implementation SentimentClassifierModel

+ (SentimentClassifier *)model {
    auto bundle = [NSBundle bundleForClass:SentimentClassifier.class];
    auto mlmodelcURL = [bundle URLForResource:@"SentimentClassifier" withExtension:@"mlmodelc"];
    if (mlmodelcURL) {
        return [SentimentClassifier new];
    }

    auto modelPath = [bundle pathForResource:@"SentimentClassifier" ofType:@"mlmodel"];
    if (!modelPath) return nil;

    auto modelURL = [NSURL fileURLWithPath:modelPath];
    mlmodelcURL = [MLModel compileModelAtURL:modelURL error:nil];
    if (!mlmodelcURL) return nil;

    auto model = [[SentimentClassifier alloc] initWithContentsOfURL:mlmodelcURL error:nil];
    return model;
}

+ (NSString *)judgeMoviceContentWith:(NSString *)content {
    NSString *judgeResult = @"未识别";//0-差评 ,1-中评, 2- 好评,3- 识别失败
    
    auto model = [self model];
    
    //处理content,转成16进制进行模型判断
    content = [self hexStringFromString:content];
    auto result = [model predictionFromText:content error:nil];
    NSLog(@"result:%@",result.label);
    if ([result.label isEqual:@"0"]) {
        judgeResult = @"差评";
    } else if ([result.label isEqual:@"1"]) {
        judgeResult = @"中评";
    } else if ([result.label isEqual:@"2"]) {
        judgeResult = @"好评";
    }
    return judgeResult;
}

+ (NSString *)hexStringFromString:(NSString *)string{
    NSData *myD = [string dataUsingEncoding:NSUTF8StringEncoding];
    Byte *bytes = (Byte *)[myD bytes];
    NSString *hexStr=@"";
    for(int i=0;i<[myD length];i++) {
        NSString *newHexStr = [NSString stringWithFormat:@"%x",bytes[i]&0xff];///16进制数
        if([newHexStr length]==1)
            hexStr = [NSString stringWithFormat:@"%@0%@",hexStr,newHexStr];
        else
            hexStr = [NSString stringWithFormat:@"%@%@",hexStr,newHexStr];
    }
    return hexStr;
}

  注:这边注意一个点是传入的影评要先转成16进制再来判断。

四、总结

  1、训练源里面的数据来测试是准确的,其他的数据的准确率有点感人,毕竟没有任何的算法,测试数据也少,优化的点很多。

  2、考虑分词来优化模型准确率,在模型判断之前就对影评做一次判断,寻找文本之间的相同点。

  3、模型只是给你一个判断,判断条件也就是特征要明确,这个关系到模型的准确率,。

猜你喜欢

转载自www.cnblogs.com/qiyiyifan/p/12357565.html