Is there any way to match a regex that starts with one string but *doesn't* start with another string?

Orel Fichman :

So I'm trying to get more familiar with Python web scraping and I'm trying to find external links only for a specific function. In the books I'm reading the author implements this by simply removing the "http://" from the string and then seeing if the new link contains the new string (which is the domain name without the preceding "http://".

I can see how this code might fail and although I can simply write an if statement it does make me wonder - is there any way to match all links that start with "http" but not with "http(s)://domain.com"? I tried many different regex solutions that I thought would work but they havent. For example, the variable "site" contains the link address.

re.compile("^((?!"+site+").)^http|www*$"))
re.compile("^http|www((?!"+site+").)*$"))

The results I get would simply be all links that start with http or www and that's not what I Intend to do. Again, I can implement this just fine with an if statement and filter the results, this isn't a complete blocker, but I'm curious about the existance of such a possibility

Any help would be appreciated. I looked around the web but couldn't find anything that matches my use case.

benterris :

To match a string that starts with one string but not with another one, you shoud use this pattern :

^(?!stringyoudontwant)stringyouwant.*

So in your case, this would be :

^(?!https?:\/\/domain\.com)http.*

For this kind of things, you can check out https://regex101.com which is the perfect interface to experiment with complicated regexes.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=360847&siteId=1