Vanitha :
I have a few logs like below
endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688
ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651
pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713
204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970
pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669
scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124
I need to extract the paths from the above logs. Here is the code that I tried
val pattern = "\\s+([^\\s]+)\\s+HTTP".r
val match = pattern.findFirstIn(log)
Here is the output that I got.
/images/ HTTP
/history/gemini/gemini-spacecraft.txt HTTP
/images/launch-logo.gif HTTP
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP
/images/WORLD-logosmall.gif HTTP
/history/mercury/mr-3/mr-3.html HTTP
How do I get rid of HTTP in the above paths?
Code Maniac :
You're match is in first capturing group,
Alternatively you can use positive lookahead
\\s+[^\\s]+(?=\\s+HTTP)