什么叫session
数据:时间区域操作
数据中的字段分别为:
访客 ip地址
访客 访问时间
访客 请求的url及协议
网站 响应码
网站 返回数据量
访客的 referral url(从哪个网站进来的)
访客的 客户端操作系统及浏览器信息
需求:
1)需要为从访问日志中梳理出每一个session(如果一个用户两次相邻请求之间的时间差<30分钟,则该两次请求都属于同一个session,否则分属不同的session),并为session中的历次请求打上序号,示意如下:
session号 ip地址 请求时间 请求url 请求次序 其他字段......
session1 ip1 2017-10-11 08:10:30 /a 1 ......
session1 ip1 2017-10-11 08:11:20 /b 2 ......
session2 ip1 2017-10-11 09:10:30 /c 1 ......
流程:
基本处理:读文件,多切分,拿到有用数据
1:把相同ip分到一块(分组)
2:相同组的数据,按照时间先后顺序来排列
3:进行时间比较 ---->打session,排顺序
2)将每次session进行汇总,得出用户每次session的浏览起、止页面,每次session会话总时长等,示意如下:
session号 ip地址 起始请求时间 结束请求时间 起始页面 跳出页面 访问时长
session1 ip1 2017-10-11 08:10:30 2017-10-11 08:11:20 /a /b 50秒
session2 ip1 2017-10-11 09:10:30 2017-10-11 09:10:30 /c /c 默认值
session3 ip2 2017-10-11 07:15:10 2017-10-11 07:30:10 /h /x 750秒
步骤分析:
1 读取日志文件,获取用户请求数据,会根据用户的ip进行分组 (Map)
2 将用户的url按照时间排序
3 判断两个相邻的url的时间差值是否是在30分钟内来确定是否是同一个session
4 判断为每个url生成sessionId并打上运行顺序标签
5 第二问,获取sessionId相同的url,得出最先请求和最终请求的两个url和之间的时间差值
知识点:
集合(存储,排序) IO 时间操作(格式转换,比较,时间差)
数据样例,(部分)
194.237.142.21 - - [18/Sep/2013:06:49:18 +0000] "GET /wp-content/uploads/2013/07/rstudio-git3.png HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible;)"
183.49.46.228 - - [18/Sep/2013:06:49:23 +0000] "-" 400 0 "-" "-"
163.177.71.12 - - [18/Sep/2013:06:49:33 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
163.177.71.12 - - [18/Sep/2013:06:49:36 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
101.226.68.137 - - [18/Sep/2013:06:49:42 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
101.226.68.137 - - [18/Sep/2013:06:49:45 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
60.208.6.156 - - [18/Sep/2013:06:49:48 +0000] "GET /wp-content/uploads/2013/07/rcassandra.png HTTP/1.0" 200 185524 "http://cos.name/category/software/packages/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] "GET /images/my.jpg HTTP/1.1" 200 19939 "http://www.angularjs.cn/A00n" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
222.68.172.190 - - [18/Sep/2013:06:50:08 +0000] "-" 400 0 "-" "-"
183.195.232.138 - - [18/Sep/2013:06:50:16 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
183.195.232.138 - - [18/Sep/2013:06:50:16 +0000] "HEAD / HTTP/1.1" 200 20 "-" "DNSPod-Monitor/1.0"
66.249.66.84 - - [18/Sep/2013:06:50:28 +0000] "GET /page/6/ HTTP/1.1" 200 27777 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
221.130.41.168 - - [18/Sep/2013:06:50:37 +0000] "GET /feed/ HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
157.55.35.40 - - [18/Sep/2013:06:51:13 +0000] "GET /robots.txt HTTP/1.1" 200 150 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
50.116.27.194 - - [18/Sep/2013:06:51:35 +0000] "POST /wp-cron.php?doing_wp_cron=1379487095.2510800361633300781250 HTTP/1.0" 200 0 "-" "WordPress/3.6; http://blog.fens.me"
58.215.204.118 - - [18/Sep/2013:06:51:35 +0000] "GET /nodejs-socketio-chat/ HTTP/1.1" 200 10818 "http://www.google.com/url?sa=t&rct=j&q=nodejs%20%E5%BC%82%E6%AD%A5%E5%B9%BF%E6%92%AD&source=web&cd=1&cad=rja&ved=0CCgQFjAA&url=%68%74%74%70%3a%2f%2f%62%6c%6f%67%2e%66%65%6e%73%2e%6d%65%2f%6e%6f%64%65%6a%73%2d%73%6f%63%6b%65%74%69%6f%2d%63%68%61%74%2f&ei=rko5UrylAefOiAe7_IGQBw&usg=AFQjCNG6YWoZsJ_bSj8kTnMHcH51hYQkAA&bvm=bv.52288139,d.aGc" "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0"
58.215.204.118 - - [18/Sep/2013:06:51:36 +0000] "GET /wp-includes/js/jquery/jquery-migrate.min.js?ver=1.2.1 HTTP/1.1" 304 0 "http://blog.fens.me/nodejs-socketio-chat/" "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0"
58.215.204.118 - - [18/Sep/2013:06:51:35 +0000] "GET /wp-includes/js/jquery/jquery.js?ver=1.10.2 HTTP/1.1" 304 0 "http://blog.fens.me/nodejs-socketio-chat/" "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0"
实现代码:
数据分析案例
import java.util.Date; public class SessionBean { private String sessionId; private String ip; private Date date; private String url; private int order; public String getSessionId() { return sessionId; } public void setSessionId(String sessionId) { this.sessionId = sessionId; } public String getIp() { return ip; } public void setIp(String ip) { this.ip = ip; } public Date getDate() { return date; } public void setDate(Date date) { this.date = date; } public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public int getOrder() { return order; } public void setOrder(int order) { this.order = order; } @Override public String toString() { return "SessionBean [sessionId=" + sessionId + ", ip=" + ip + ", date=" + date + ", url=" + url + ", order=" + order + "]"; } }
一定要明确项目想要什么结果,或者有开发文档,否则会很迷茫
import java.io.BufferedReader; import java.io.FileReader; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.Date; import java.util.HashMap; import java.util.List; import java.util.Locale; import java.util.Map; import java.util.Map.Entry; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; import ch03.IpUtils; public class TestMain { public static void main(String[] args) { //得到ip对应的sessionBean集合 Map<String, List<SessionBean>> map1 = getIpSessionBeanMap(); //按照时间排序 sortByDate(map1); //生成sessionid makeSessionId(map1); //sessionid对应的list结合 Map<String, List<SessionBean>> map2 = new HashMap<>(); // Set<Entry<String,List<SessionBean>>> entrySet2 = map1.entrySet(); for (Entry<String, List<SessionBean>> entry : entrySet2) { List<SessionBean> value = entry.getValue(); for (SessionBean sessionBean : value) { //到这里能够拿到每一条数据 List<SessionBean> list = map2.getOrDefault(sessionBean.getSessionId(), new ArrayList<>()); list.add(sessionBean); map2.put(sessionBean.getSessionId(), list); } } //因为上面是有序的,取得的值也是有序的,得到的list集合也是按照时间升序排列的 ,省去排序 Set<Entry<String,List<SessionBean>>> entrySet3 = map2.entrySet(); for (Entry<String, List<SessionBean>> entry : entrySet3) { String sessionId = entry.getKey(); List<SessionBean> list = entry.getValue(); SessionBean first = list.get(0); SessionBean end = list.get(list.size()-1); long cha = end.getDate().getTime()-first.getDate().getTime(); String ret = sessionId+"\t"+first.getIp()+"\t" +first.getDate()+"\t"+end.getDate()+"\t"+first.getUrl() +"\t"+end.getUrl()+"\t"+(cha/1000); System.out.println(ret); } /*Set<Entry<String,List<SessionBean>>> entrySet = map2.entrySet(); for (Entry<String, List<SessionBean>> entry : entrySet) { System.out.println(entry.getKey()); List<SessionBean> value = entry.getValue(); for (SessionBean sessionBean : value) { System.out.println(sessionBean); } System.out.println("---------------------------"); }*/ } /** * 用来生成sessionid和order * @param map1 */ private static void makeSessionId(Map<String, List<SessionBean>> map1) { Set<Entry<String,List<SessionBean>>> entrySet = map1.entrySet(); for (Entry<String, List<SessionBean>> entry : entrySet) { //获取到相同ip的sessionBean集合 List<SessionBean> list = entry.getValue(); //当长度等于一的时候 if(list.size()==1){ //获取到对应的sessionBean //String uuid = UUID.randomUUID().toString(); SessionBean sessionBean = list.get(0); sessionBean.setSessionId(getSessionId(sessionBean.getIp())); sessionBean.setOrder(1); } //当长度大于1的时候 for(int i = 0;i<list.size()-1;i++){ SessionBean session1 = list.get(i); SessionBean session2 = list.get(i+1); //同一个session的时候 if(isSameSession(session1,session2)){ if(session1.getSessionId()!=null){ session2.setSessionId(session1.getSessionId()); session2.setOrder(session1.getOrder()+1); }else{ session1.setSessionId(getSessionId(session1.getIp())); session1.setOrder(1); session2.setSessionId(session1.getSessionId()); session2.setOrder(session1.getOrder()+1); } }else{//不是同一个session的时候 if(session1.getSessionId()!=null){ session2.setSessionId(getSessionId(session2.getIp())); session2.setOrder(1); }else{ session1.setSessionId(getSessionId(session1.getIp())); session1.setOrder(1); session2.setSessionId(getSessionId(session2.getIp())); session2.setOrder(1); } } } } } /** * 判断两个session是否是同一个session * @param session1 * @param session2 * @return */ private static boolean isSameSession(SessionBean session1, SessionBean session2) { long date1 = session1.getDate().getTime(); long date2 = session2.getDate().getTime(); //session时间0-30分钟 long cha = date2-date1; if(cha>=0&&cha<=(1000*60*30)){ return true; } return false; } /** * 生成sessionId ip+时间 * @param ip * @return */ private static String getSessionId(String ip) { long longIp = IpUtils.strIpToLongIp(ip); long nanoTime = System.nanoTime(); return ""+longIp+nanoTime; } /** * 对map里面的每一个list按时间排序 * @param map1 */ private static void sortByDate(Map<String, List<SessionBean>> map1) { Set<Entry<String,List<SessionBean>>> entrySet = map1.entrySet(); for (Entry<String, List<SessionBean>> entry : entrySet) { List<SessionBean> list = entry.getValue(); Collections.sort(list, new Comparator<SessionBean>() { @Override public int compare(SessionBean o1, SessionBean o2) { Date date1 = o1.getDate(); Date date2 = o2.getDate(); return date1.before(date2)?-1:1; } }); } } private static Map<String, List<SessionBean>> getIpSessionBeanMap() { //用来存放ip对应的sessionBean集合 Map<String, List<SessionBean>> map1 = new HashMap<>(); try(BufferedReader br =new BufferedReader(new FileReader("../案例练习4/src/ch06/access.log.fensi"));) { String line = null; while((line = br.readLine())!=null){ //System.out.println(line); String ipRegex = "(\\d+\\.){3}\\d+"; String dateRegex = "\\[.+\\d+\\]"; String urlRegex = "(POST|GET){1}\\s(\\S)*\\s"; String ip = getContByRegex(line,ipRegex); String date = getContByRegex(line,dateRegex); String url = getContByRegex(line,urlRegex); //System.out.println(url); //数据过滤,数据清洗 if(url!=null&&date!=null&&ip!=null){ SessionBean session = new SessionBean(); session.setIp(ip); session.setUrl(url); session.setDate(parseDate(date)); List<SessionBean> list = map1.getOrDefault(session.getIp(), new ArrayList<>()); list.add(session); map1.put(ip, list); } } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } return map1; } /** * 根据字符串时间转化成date * @param date * @return */ private static Date parseDate(String date) { //[18/Sep/2013:06:51:37 +0000] String substring = date.substring(1, date.length()-1); SimpleDateFormat format = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss", Locale.US); try { return format.parse(substring); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } return null; } /** * 根据正则表达书,匹配出所需要的数据 * @param line * @param ipRegex * @return */ private static String getContByRegex(String line, String ipRegex) { Pattern compile = Pattern.compile(ipRegex); Matcher matcher = compile.matcher(line); while(matcher.find()){ return matcher.group(); } return null; } }