Spark机器学习,API浏览 Spark官方API http://spark.apache.org/docs/1.6.2/api/java/index.html http://spark.apache.org/docs/2.2.0/api/java/index.html 1 RDD的支持,是Spark的基础,2根据需求来查看API 一Spark的功能模块 SparkSQL SparkGraphx SparkScreaming SparkML SparkMLLIb 二常用的机器学习的API ml 输入采用DataFrame(输入来源于SparkSQL) mllib 输入参数是普通的RDD(输入来自于hdfs) 例子userId(用户ID),productId(产品ID),评分,来推荐给用户 协同过滤来找到用户对其它产品感兴趣 常用算法:ALS算法(最小二乘法) org.apache.spark.ml.recommendation ALS 监督分类: org.apache.spark.mllib.classification, 预先给用户打上标签 非监督分类mllib.clustering 里面也是一样的方法 KMeans 决策树 mllib.tree 图形计算org.apache.spark.graphx org.apache.spark.sql : 我们把数据导入到mysql中,如何放入到spark中来,然后进行机器学习进行预测统计分析,然后放入到hdfs中去 四API扩展 可以从mysql,oracle中读取数据 org.apache.spark.sql org.apache.spark.sql.api.java org.apache.spark.sql.expressions org.apache.spark.sql.hive org.apache.spark.sql.hive.execution org.apache.spark.sql.jdbc org.apache.spark.sql.sources org.apache.spark.sql.types org.apache.spark.sql.util org.apache.spark.straming相当于我们的流式计算, org.apache.spark.streaming.flume org.apache.spark.streaming.kafka org.apache.spark.streaming.kinesis org.apache.spark.streaming.mqtt org.apache.spark.streaming.receiver org.apache.spark.streaming.scheduler org.apache.spark.streaming.twitter org.apache.spark.streaming.util org.apache.spark.streaming.zeromq ml 输入采用DataFrame(输入来源于SparkSQL) org.apache.spark.ml org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util mllib 输入参数是普通的RDD(输入来自于hdfs) org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util
六 Spark API介绍
猜你喜欢
转载自blog.csdn.net/xsjzdrxsjzdr/article/details/85616412
今日推荐
周排行