Extract a Path from a server log String using regular expressions in Scala

Vanitha :

I have a few logs like below

endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688                                   
ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651       
pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713                        
204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970  
pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669                        
scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124

I need to extract the paths from the above logs. Here is the code that I tried

val pattern = "\\s+([^\\s]+)\\s+HTTP".r
val match = pattern.findFirstIn(log)

Here is the output that I got.

/images/ HTTP
/history/gemini/gemini-spacecraft.txt HTTP
/images/launch-logo.gif HTTP
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP
/images/WORLD-logosmall.gif HTTP
/history/mercury/mr-3/mr-3.html HTTP

How do I get rid of HTTP in the above paths?

Code Maniac :

You're match is in first capturing group,

Alternatively you can use positive lookahead

\\s+[^\\s]+(?=\\s+HTTP)

Demo

enter image description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=155184&siteId=1