Spark various examples

1. spark to weight (as a key data of each row grouping, so were the de-emphasis, and then remove the key can be a)

Original data:
 2012-3-1 A

 2012-3-2 B

 2012-3-3 C
2012-3-2 b
实现源码:
 rdd.filter(_.trim().length() > 0).map(line => (line.trim(), "")).groupByKey().sortByKey(true).keys.foreach(println)

2. Data cleaning (filtration)

Original data: 
HTTPS: //blog.csdn.net/weixin_42540606/article/details/81100882 

HTTP: //192.168.20.111:8080/ 

HTTPS: //www.cnblogs.com/redhat0019/p/8665491 .html 

HTTP: / /192.168.20.111:50070/dfshealth.html # the Tab-the Overview 

HTTP: //192.168.20.124:1082/osgiWeb/page/hgu/index.jsp

 

Guess you like

Origin www.cnblogs.com/redhat0019/p/11242603.html