给定的access.log是电信运营商的用户上网数据,第一个字段是时间,第二个字段是ip地址,
第三更字段是访问的网站,其他字段可以忽略不计。
ip.txt是ip地址和归属地的规则数据,里面的数据是根据ip地址的十进制从高到低排序。
第一个字段是网段的起始IP地址,第二个字段是网段的结束IP地址,
第三个字段是网段的起始IP地址对应的十进制,第四个字段是网段的结束IP地址对应的十进制,
第五个字段代表洲,第六个代表国家,第七个代表省,第八个代表城市,其他字段可以忽略不计。
要求:通过计算access.log中的用户行为数据,
统计出各个省份访问量(一次请求记作一次独立的访问量),
并按照各个省份的访问量的从高到低进行排序
扫描二维码关注公众号,回复:
621777 查看本文章
1 ip转化的方法
Ip地址都是用十六进制表示的: 例如17.18.20.15 也就是11.12.14.0f 换算成十进制 15+20*256+18*256^2+17*256^3 使用二进制计算
2 ip文件
ip.txt
1.0.1.0|1.0.3.255|16777472|16778239|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.0.8.0|1.0.15.255|16779264|16781311|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.0.32.0|1.0.63.255|16785408|16793599|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.1.0.0|1.1.0.255|16842752|16843007|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.1.2.0|1.1.7.255|16843264|16844799|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.1.8.0|1.1.63.255|16844800|16859135|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.2.0.0|1.2.1.255|16908288|16908799|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.2.2.0|1.2.2.255|16908800|16909055|亚洲|中国|北京|北京|海淀|北龙中网|110108|China|CN|116.29812|39.95931 1.2.4.0|1.2.4.255|16909312|16909567|亚洲|中国|北京|北京||中国互联网信息中心|110100|China|CN|116.405285|39.904989 1.2.5.0|1.2.7.255|16909568|16910335|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.2.8.0|1.2.8.255|16910336|16910591|亚洲|中国|北京|北京||中国互联网信息中心|110100|China|CN|116.405285|39.904989 1.2.9.0|1.2.127.255|16910592|16941055|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.3.0.0|1.3.255.255|16973824|17039359|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.4.1.0|1.4.3.255|17039616|17040383|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.4.4.0|1.4.4.255|17040384|17040639|亚洲|中国|北京|北京|海淀|北龙中网|110108|China|CN|116.29812|39.95931 1.4.5.0|1.4.7.255|17040640|17041407|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 1.4.8.0|1.4.127.255|17041408|17072127|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178 1.8.0.0|1.8.255.255|17301504|17367039|亚洲|中国|北京|北京|海淀|北龙中网|110108|China|CN|116.29812|39.95931 1.10.0.0|1.10.7.255|17432576|17434623|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1万条记录前20条
3.日志文件
1万条记录前20条
20090121000132095572000|125.213.100.123|show.51.com|/shoplist.php?phpfile=shoplist2.php&style=1&sex=137|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 http://bsalsa.com/ EmbeddedWB- 14.59 from: http://bsalsa.com/ )|http://show.51.com/main.php| 20090121000132124542000|117.101.215.133|www.jiayuan.com|/19245971|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler 4.0)|http://photo.jiayuan.com/index.php?uidhash=d1c3b69e9b8355a5204474c749fb76ef|__tkist=0; myloc=50%7C5008; myage=2009; PROFILE=14469674%3A%E8%8B%A6%E6%B6%A9%E5%92%96%E5%95%A1%3Am%3Aphotos2.love21cn.com%2F45%2F1b%2F388111afac8195cc5d91ea286cdd%3A1%3A%3Ahttp%3A%2F%2Fimages.love21cn.com%2Fw4%2Fglobal%2Fi%2Fhykj_m.jpg; last_login_time=1232454068; SESSION_HASH=8176b100a84c9a095315f916d7fcbcf10021e3af; RAW_HASH=008a1bc48ff9ebafa3d5b4815edd04e9e7978050; COMMON_HASH=45388111afac8195cc5d91ea286cdd1b; pop_1232093956=1232468896968; pop_time=1232466715734; pop_1232245908=1232469069390; pop_1219903726=1232477601937; LOVESESSID=98b54794575bf547ea4b55e07efa2e9e; main_search:14469674=%7C%7C%7C00; registeruid=14469674; REG_URL_COOKIE=http%3A%2F%2Fphoto.jiayuan.com%2Fshowphoto.php%3Fuid_hash%3D0319bc5e33ba35755c30a9d88aaf46dc%26total%3D6%26p%3D5; click_count=0%2C3363619 20090121000132406516000|117.101.222.68|gg.xiaonei.com|/view.jsp?p=389|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA)|http://home.xiaonei.com/Home.do?id=229670724|_r01_=1; __utma=204579609.31669176.1231940225.1232462740.1232467011.145; __utmz=204579609.1231940225.1.1.utmccn=(direct) 20090121000132581311000|115.120.36.118|tj.tt98.com|/tj.htm|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TheWorld)|http://www.tt98.com/| 20090121000132864647000|123.197.64.247|cul.sohu.com|/20071227/n254338813_22.shtml|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TheWorld)|http://cul.sohu.com/20071227/n254338813_22.shtml|ArticleTab=visit:1; IPLOC=unknown; SUV=0901080709152121; vjuids=832dd37a1.11ebbc5d590.0.b20f858f14e918; club_chat_ircnick=JaabvxC4aaacQ; spanel=%7B%22u%22%3A%22%22%7D; vjlast=1232467312,1232467312,30 20090121000133296729000|222.55.57.176|down.chinaz.com|/|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iCafeMedia; TencentTraveler 4.0)||cnzz_a33219=0; vw33219=%3A18167791%3A; sin33219=http%3A//www.itxls.com/wz/wyfx/it.html; rtime=0; ltime=1232464387281; cnzz_eid=6264952-1232464379-http%3A//www.itxls.com/wz/wyfx/it.html 20090121000133331104000|123.197.66.93|www.pkwutai.cn|/down/downLoad-id-45383.html|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7)|http://www.baidu.com/s?tn=b1ank_pg&ie=gb2312&bs=%C3%C0%C6%BC%B7%FE%D7%B0%B9%DC%C0%ED%C8%ED%BC%FE&sr=&z=&cl=3&f=8&wd=%C6%C6%BD%E2%C3%C0%C6%BC%B7%FE%D7%B0%B9%DC%C0%ED%C8%ED%BC%FE&ct=0| 20090121000133446262000|115.120.12.157|v.ifeng.com|/live/|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; CIBA)|http://www.ifeng.com/|userid=1232466610953_4339; location=186; sclocationid=10002; vjuids=22644b162.11ef4bc1624.0.63ad06717b426; vjlast=1232466614,1232467297,13 20090121000133456256000|115.120.7.240|cqbbs.soufun.com|/3110502342~-1~2118/23004348_23004348.htm|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; CIBA)||new_historysignlist=%u534E%u6DA6%u4E8C%u5341%u56DB%u57CE%7Chttp%3A//cqbbs.soufun.com/board/3110502342/%7C%7C%u9A8F%u9038%u7B2C%u4E00%u6C5F%u5CB8%7Chttp%3A//cqbbs.soufun.com/board/3110169184/%7C%7C%u793E%u533A%u4E4B%u661F%7Chttp%3A//cqbbs.soufun.com/board/sqzx/%7C%7C; SoufunSessionID=2y5xyr45kslc0zbdooqnoo55; viewUser=1; vjuids=-870e9088.11ee89aba57.0.be9c3d988def8; vjlast=1232263101,1232380806,11; new_viewtype=1; articlecolor=#000000; usersms_pop_type=1; articlecount=186; __utma=101868291.755195653.1232450942.1232450942.1232450942.1; __utmz=101868291.1232450942.1.1.utmccn=(referral) 20090121000133586141000|117.101.219.241|12.zgwow.com|/launcher/index.htm|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)|| 20090121000133744103000|123.197.49.171|2.82yyy.com|/32/webpage/L/2.Html|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; QQDownload 1.7; TencentTraveler ; Maxthon; .NET CLR 1.1.4322)|http://2.82yyy.com/32/webpage/L/1.Html|cnzz_a998284=3; vw998284=%3A52225577%3A68566865%3A68566789%3A68566815%3A; sin998284=none; rtime=0; ltime=1232466017187; cnzz_eid=1870962-1232464084-; cnzz_a1021073=3; vw1021073=%3A34926533%3A; sin1021073=none; 61kkk=1,1232464210281 20090121000133757842000|117.101.213.104|game.7679.com|/scroll.php|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CIBA)|http://game.7679.com/games/1021/|cnzz_a30008507=7; rtime=2; ltime=1232466389781; cnzz_eid=12877395-http%3A//apps.51.com/application.php%3Fapp_key%3D4a99277cca695a34ba39719399030076; 4a99277cca695a34ba39719399030076_user=tangqingqing33; 4a99277cca695a34ba39719399030076_session_key=1b203792173c71e961fd8cafdf011f9d; 4a99277cca695a34ba39719399030076_time=1232466378; 4a99277cca695a34ba39719399030076=cf9441753f0b3312fd76b18b68261287 20090121000134038848000|115.120.10.205|bf.bearcn.com|/user.asp?userid=9795|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler ; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727)|http://bf.bearcn.com/Photo.asp?page=7|ASPSESSIONIDCATABTSR=MDIFCDPCMMGJBDEBMJCPCGHF; BearCN=viewid=20944; ASPSESSIONIDCAQDCQSR=OEDPCHPCOJIKCGECBIFLAOGI 20090121000134178887000|117.101.218.147|www.baidu.com|/|test||BAIDUID=4221AC111420E40EFA125AEC596813B7:FG=1 20090121000134259104000|115.120.17.80|www.sjshu.com|/bookdown/ShowSoftDown.asp?UrlID=1&SoftID=22222|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7; .NET CLR 2.0.50727)|http://www.sjshu.com/bookdown/200803/22222.shtml|ASPSESSIONIDQQASTRAD=CEJGLDLCADCNKJOPLAMEKDJJ; AJSTAT_ok_pages=3; AJSTAT_ok_times=1; ppad_cookie_0=1 20090121000134372468000|117.101.220.175|www.zhaodll.com|/dll/softdown.asp?softid=306|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)|http://www.zhaodll.com/dll/k/200607/306.html|ASPSESSIONIDQSTACDQD=HGCADNMCGHEGNPAFENHNDHKM; cnzz_a206791=0; vw206791=%3A60592519%3A; sin206791=http%3A//www.baidu.com/s%3Fwd%3Dksuser.dll%26tn%3Dyjhy_dg%26bar%3D; rtime=0; ltime=1232467247934; cnzz_eid=75465914-1232467247-http%3A//www.baidu.com/s%3Fwd%3Dksuser.dll%26tn%3Dyjhy_dg%26bar%3D 20090121000134389671000|123.197.66.12|video.baidu.com|/v?ct=301989888&rn=20&pn=0&db=0&s=8&word=%C9%AB%BC%B4%CA%C7%BF%D52%CC%F0%D0%D4%C9%AC%B0%AE|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)|http://web.gougou.com/bfs?search=%e8%89%b2%e5%8d%b3%e6%98%af%e7%a9%ba%32%e7%94%9c%e6%80%a7%e6%b6%a9%e7%88%b1&page=1&id=10000001&f=1&t=-1|BAIDUID=7AEB83E2A2E24FE200CE048A70DCBD9E:FG=1; BDSTAT=cd2b93848358eb180fb30f2442a7d933c895d143ad4bd11373f0820258afe324 20090121000134422762000|125.213.100.236|longma168.com|/al/468x60-1.htm|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)|http://www.longwang.biz/pm.php?action=view&pmid=1985410| 20090121000134802198000|123.197.66.208|webim.51.com|/webim/main.php?to=|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CIBA)|http://my.51.com/webim/index.php?refer=/| 20090121000135066307000|115.120.19.122|www.lrdvd.cn|/VodHtml/LRDVD_JOEMKPRME4.Html|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; QQDownload 1.7; TencentTraveler 4.0; .NET CLR 2.0.50727)|http://www.lrdvd.cn/|cnzz_a1024800=10; vw1024800=%3A15617943%3A67204680%3A33591841%3A406038289%3A67204963%3A30929190%3A67203568%3A32729070%3A; sin1024800=none; rtime=0; ltime=1232459483187; cnzz_eid=75914052-1232452930-; CFWztgShowCookie=1; ASPSESSIONIDSQRTDCDS=DCKNIFOCLJHGHLAIIIBJNJKD; urlid=http%3A%2F%2Fdy1%2Ejhdvd%2Enet%3A8032%2F418945609%2FC1F3DF52DBD9A8E72A0C2CC446BC15CB9DEB792A%2Flrdvd%2Ecn%5F%D0%C2%CB%C0%CD%F6%D3%CE%CF%B7%2D%C0%C1%C8%CBDVD%D3%B0%D4%BA%2Ermvb; serverip=Qvod
4.
给定的access.log是电信运营商的用户上网数据,第一个字段是时间,第二个字段是ip地址, 第三更字段是访问的网站,其他字段可以忽略不计。 ip.txt是ip地址和归属地的规则数据,里面的数据是根据ip地址的十进制从高到低排序。 第一个字段是网段的起始IP地址,第二个字段是网段的结束IP地址, 第三个字段是网段的起始IP地址对应的十进制,第四个字段是网段的结束IP地址对应的十进制, 第五个字段代表洲,第六个代表国家,第七个代表省,第八个代表城市,其他字段可以忽略不计。 要求:通过计算access.log中的用户行为数据, 统计出各个省份访问量(一次请求记作一次独立的访问量), 并按照各个省份的访问量的从高到低进行排序
实现方案一:
IpABean.java
public class IpBean { private String startIp; private String endIp; private long startDecIp; private long endDecIp; private String province; private String city; private String optioner; public void set(String startIp, String endIp, long startDecIp, long endDecIp, String province, String city, String optioner) { this.startIp = startIp; this.endIp = endIp; this.startDecIp = startDecIp; this.endDecIp = endDecIp; this.province = province; this.city = city; this.optioner = optioner; } public String getStartIp() { return startIp; } public void setStartIp(String startIp) { this.startIp = startIp; } public String getEndIp() { return endIp; } public void setEndIp(String endIp) { this.endIp = endIp; } public long getStartDecIp() { return startDecIp; } public void setStartDecIp(long startDecIp) { this.startDecIp = startDecIp; } public long getEndDecIp() { return endDecIp; } public void setEndDecIp(long endDecIp) { this.endDecIp = endDecIp; } public String getProvince() { return province; } public void setProvince(String province) { this.province = province; } public String getCity() { return city; } public void setCity(String city) { this.city = city; } public String getOptioner() { return optioner; } public void setOptioner(String optioner) { this.optioner = optioner; } @Override public String toString() { return "IpBean [startIp=" + startIp + ", endIp=" + endIp + ", startDecIp=" + startDecIp + ", endDecIp=" + endDecIp + ", province=" + province + ", city=" + city + ", optioner=" + optioner + "]"; } }
导包使用Junit
import org.junit.Test; public class IpUtilesTest { @Test public void testStrIpToLongIp(){ long ip = IpUtils.strIpToLongIp("1.26.212.0"); if(ip==18535423L){ System.out.println("OK"); }else{ throw new RuntimeException("NG"); } } }IpUtil.java (工具类,很重要)
import java.io.BufferedReader; import java.io.FileReader; import java.util.ArrayList; import java.util.List; public class IpUtils { //目的是减少读取文件的次数, public static List<IpBean> ipBeanList =null; static{ ipBeanList = getIpBeanList(); } public static void main(String[] args) { //System.out.println(strIpToLongIp("1.0.1.0")); List<IpBean> ipBeanList = getIpBeanList(); System.out.println(ipBeanList.size()); } /** * 通过stringIp转换为长整型的ip * @param str * @return */ public static long strIpToLongIp(String str){ if(str==null){ return 0L; } long newIp = 0; String[] split = str.split("\\."); for(int i = 0;i<=3;i++){ long lL=Long.parseLong(split[i]); newIp |=lL <<((3-i)<<3); } return newIp; } /** * 获取存放ipBean的list集合 * @return */ public static List<IpBean> getIpBeanList(){ List<IpBean> list = new ArrayList<>(); try (BufferedReader br = new BufferedReader(new FileReader("../案例练习4/src/ch03/ip.txt"));){ String line = null; //1.0.1.0|1.0.3.255|16777472|16778239|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302 while((line=br.readLine())!=null){ //System.out.println(line); String[] split = line.split("\\|"); String startIp = split[0]; String endIp = split[1]; long startDecIp =Long.parseLong(split[2]); long endDecIp = Long.parseLong(split[3]); String province = split[6]; String city = split[7]; String optioner = split[9]; //System.out.println(optioner); IpBean bean = new IpBean(); bean.set(startIp, endIp, startDecIp, endDecIp, province, city, optioner); list.add(bean); } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return list; } /** * 通过longIp从list里面获取相对应的IpBean * @param longIp * @return */ @Deprecated public static IpBean getIpBeanByLongIp(long longIp) { for (IpBean ipBean : ipBeanList) { if(longIp>=ipBean.getStartDecIp()&&longIp<=ipBean.getEndDecIp()){ return ipBean; } } return null; } /** * 使用二分法通过ip找到对应的ipBean * @param longIp * @return */ public static IpBean getIpBeanByLongIpNew(long longIp){ int start = 0; int end = ipBeanList.size()-1; while(start<=end){ int middel = (start+end)/2; IpBean ipBean = ipBeanList.get(middel); //如果middel对应的ipBean是不是找的值 if(longIp>=ipBean.getStartDecIp()&&longIp<=ipBean.getEndDecIp()){ return ipBean; } //小于最小值的时候 if(longIp<ipBean.getStartDecIp()){ end = middel-1; } //大于最大值的时候 if(longIp>ipBean.getEndDecIp()){ start = middel+1; } } return null; } }TestJunit.java
import org.junit.After; import org.junit.Before; import org.junit.Test; /** * 不能带参数,不能带返回值 * @author Administrator * */ public class TestJunit { @Before public void testBefore(){ System.out.println("testBefore"); } @Test public void testTest(){ System.out.println("testTest"); } @Test public void testTest2(){ System.out.println("testTest2"); } @After public void testAfter(){ System.out.println("testAfter"); } }
TestMain.java
import java.io.BufferedReader; import java.io.FileReader; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; public class TestMain { public static void main(String[] args) { //存放<省份---次数> Map<String, Integer> map = new HashMap<>(); try (BufferedReader br = new BufferedReader(new FileReader("../案例练习4/src/ch03/access.log"));){ String line = null; //20090121001921274475000|115.120.13.157|www.qqkuai.com|/Player.asp?2811,5,6,24.h while((line=br.readLine())!=null){ //System.out.println(line); String[] split = line.split("\\|"); String strIp = split[1]; //通过字符串ip获取长整型的ip long longIp = IpUtils.strIpToLongIp(strIp); //通过长整型的ip获取对应的IpBean IpBean bean = IpUtils.getIpBeanByLongIpNew(longIp); //System.err.println(bean); //取值添加到map中 String province = bean.getProvince(); Integer count = map.getOrDefault(province, 0); count++; map.put(province, count); } /*for (Entry<String, Integer> entry : map.entrySet()) { System.out.println(entry); }*/ //map的排序 Set<Entry<String,Integer>> entrySet = map.entrySet(); List<Entry<String,Integer>> list = new ArrayList<>(entrySet); Collections.sort(list, new Comparator<Entry<String, Integer>>() { @Override public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) { // TODO Auto-generated method stub return o2.getValue()-o1.getValue(); } }); //保存到文件中 for (Entry<String, Integer> entry : list) { System.out.println(entry); } } catch (Exception e) { e.printStackTrace(); } } }