introduction
Passive information collection refers to not directly interacting with the target host, and usually obtains the target host's information indirectly based on search engines or social networking methods.
Passive information collection mainly extracts target asset information through search engines or social networking and other methods, usually including IP query, Whois query, subdomain collection, etc. When collecting passive information, there is no interaction with the target, and the target information can be mined without touching the target system. The main methods include DNS resolution, subdomain mining, email crawling, etc.
IP query
IP query is the process of querying the corresponding IP address through the currently obtained URL
Whois query
Whois is a transmission protocol used to query the IP and owner information of a domain name. Simply put, it is a database to query whether the domain name has been registered and the detailed information of the registered domain name, such as the domain name owner, domain name registrar, etc.
Subdomain mining
Domain names can be divided into top-level domain names, first-level domain names, and second-level domain names. In the testing process, if no related vulnerabilities are found when testing the target master station, it is usually considered to mine the subdomain name of the target system at this time. There are many methods of subdomain mining, such as search engine, subdomain cracking, dictionary query, etc.
Among the tools currently in use, my favorite is the Layer subdomain excavator
Mail scraping
In the process of infiltrating the target system, if the security of the target server is very high and it is difficult to obtain the target permission through the server, the social engineering method is usually used to further attack the target server. Email phishing is one of the common attack methods. After crawling and processing the relevant email information on the search page, the obtained email account is used to send phishing emails in batches to trick or defraud users or administrators to log in or click to execute the account, thereby obtaining system permissions.
Related function libraries to be used
import sys
import getopt
import requests
from bs4 import BeautifulSoup
import re
The main program entry, sys.argv[0] represents the path of the code itself, sys.argv[1:] represents all the following parameters except the path, and returns a list form
if __name__ == '__main__':
#定义异常
try:
start(sys.argv[1:])
except KeyboardInterrupt:
print("interrupted by user,killing all threads...")
Main program in start function, where the most important to understand getopt.getopt () function, the function of this role is mainly cutting command line parameters and returns a two-tuple list, respectively '-'
and '--'
cutting
#主函数,传入用户输入的参数
def start(argv):
url = ""
pages = ""
if len(sys.argv) < 2:
print("-h 帮助信息;\n")
sys.exit()
#定义异常处理
try:
banner()
opts,args = getopt.getopt(argv,"-u:-p:-h")
except getopt.GetoptError:
print('Error an argument!')
sys.exit()
for opt,arg in opts:
if opt == "-u":
url = arg
elif opt == "-p":
pages = arg
elif opt == "-h":
print(usage())
launcher(url,pages)
opts is the analyzed format information. args are the remaining command-line parameters that are not format information, that is, information other than the long or short option characters and additional parameters defined in getopt(). The opt in the for loop is the first parameter in the tuple (that is, the parameter entered by the user), and arg is the second parameter of the tuple (that is, the parameter value entered by the user).
Output help information to increase the readability and ease of use of code tools
#banner信息
def banner():
print('\033[1;34m########################################################################################\033[0m\n'
'\033[1;34m######################################\033[1;32mpython安全实战\033[1;34m#####################################\033[0m\n'
'\033[1;34m########################################################################################\033[0m\n')
#使用规则
def usage():
print('-h: --help 帮助;')
print('-u: --url 域名;')
print('-p: --pages 页数;')
print('eg: python -u "www.baidu.com" -p 100' + '\n')
sys.exit()
Next is the main body of the function, which is also the core part. First, write the vulnerability callback function to enhance the robustness and scalability of the code
#漏洞回调函数
def launcher(url,pages):
email_num = []
key_words = ['email','mail','mailbox','邮件','邮箱','postbox']
for page in range(1,int(pages)+1):
for key_word in key_words:
bing_emails = bing_search(url,page,key_word)
baidu_emails = baidu_search(url,page,key_word)
sum_emails = bing_emails + baidu_emails
for email in sum_emails:
if email in email_num:
pass
else:
print(email)
with open('data.txt','a+') as f:
f.write(email + '\n')
email_num.append(email)
Then write two crawler functions for crawling emails, because Bing and Baidu both have anti-climbing protection, you can bypass the anti-climbing protection of search engines by limiting referer, cookie and other information
#bingSearch
def bing_search(url,page,key_word):
referer = "http://cn.bing.com/search?q=email+site%3abaidu.com&qs=n&sp=-1&pq=emailsite%3abaidu.com&first=1&FORM=PERE1"
conn = requests.session()
bing_url = "https://cn.bing.com/search?q="+key_word+"site%3a"+url+"&qs=n&sp=-1&pq="+key_word+"site%3a"+url+"&first="+str((page-1)*10)+"&FORM=PERE1"
conn.get('http://cn.bing.com',headers=headers(referer))
r = conn.get(bing_url,stream=True,headers=headers(referer),timeout=8)
emails = search_email(r.text)
return emails
#baiduSearch
def baidu_search(url,page,key_word):
email_list = []
emails = []
referer = "https://www.baidu.com/s?wd=email+site%3Abaidu.com&pn=1"
baidu_url = "https://www.baidu.com/s?wd="+key_word+"+site%3A"+url+"&pn="+str((page-1)*10)
conn = requests.session()
conn.get(referer,headers=headers(referer))
r = conn.get(baidu_url, headers=headers(referer))
soup = BeautifulSoup(r.text, 'lxml')
tagh3 = soup.find_all('h3')
for h3 in tagh3:
href = h3.find('a').get('href')
try:
r = requests.get(href, headers=headers(referer),timeout=8)
emails = search_email(r.text)
except Exception as e:
pass
for email in emails:
email_list.append(email)
return email_list
Obtain the mailbox number through regular expressions, and attach optional flag modifiers to control the matching pattern.
def search_email(html):
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+",html,re.I)
return emails
def headers(referer):
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
'Accept': '*/*',
'Accept-Language':'en-US,en;q=0.5',
'Accept-Encoding':'gzip,deflate',
'Referer':referer}
return headers
Modifier | description |
---|---|
re.I | Make the match case insensitive |
re.L | Do locale-aware matching |
re.M | Multi-line matching, affects ^ and $ |
re.S | Make. Match all characters including newline |
re.U | Analyze characters according to the Unicode character set. This flag affects \w, \W, \b, \B. |
re.X | This flag allows you to write regular expressions easier to understand by giving you more flexible formatting. |
Finally, attach the complete code
import sys
import getopt
import requests
from bs4 import BeautifulSoup
import re
#主函数,传入用户输入的参数
def start(argv):
url = ""
pages = ""
if len(sys.argv) < 2:
print("-h 帮助信息;\n")
sys.exit()
#定义异常处理
try:
banner()
opts,args = getopt.getopt(argv,"-u:-p:-h")
except getopt.GetoptError:
print('Error an argument!')
sys.exit()
for opt,arg in opts:
if opt == "-u":
url = arg
elif opt == "-p":
pages = arg
elif opt == "-h":
print(usage())
launcher(url,pages)
#banner信息
def banner():
print('\033[1;34m########################################################################################\033[0m\n'
'\033[1;34m######################################\033[1;32mpython安全实战\033[1;34m#####################################\033[0m\n'
'\033[1;34m########################################################################################\033[0m\n')
#使用规则
def usage():
print('-h: --help 帮助;')
print('-u: --url 域名;')
print('-p: --pages 页数;')
print('eg: python -u "www.baidu.com" -p 100' + '\n')
sys.exit()
#漏洞回调函数
def launcher(url,pages):
email_num = []
key_words = ['email','mail','mailbox','邮件','邮箱','postbox']
for page in range(1,int(pages)+1):
for key_word in key_words:
bing_emails = bing_search(url,page,key_word)
baidu_emails = baidu_search(url,page,key_word)
sum_emails = bing_emails + baidu_emails
for email in sum_emails:
if email in email_num:
pass
else:
print(email)
with open('data.txt','a+') as f:
f.write(email + '\n')
email_num.append(email)
#bingSearch
def bing_search(url,page,key_word):
referer = "http://cn.bing.com/search?q=email+site%3abaidu.com&qs=n&sp=-1&pq=emailsite%3abaidu.com&first=1&FORM=PERE1"
conn = requests.session()
bing_url = "https://cn.bing.com/search?q="+key_word+"site%3a"+url+"&qs=n&sp=-1&pq="+key_word+"site%3a"+url+"&first="+str((page-1)*10)+"&FORM=PERE1"
conn.get('http://cn.bing.com',headers=headers(referer))
r = conn.get(bing_url,stream=True,headers=headers(referer),timeout=8)
emails = search_email(r.text)
return emails
#baiduSearch
def baidu_search(url,page,key_word):
email_list = []
emails = []
referer = "https://www.baidu.com/s?wd=email+site%3Abaidu.com&pn=1"
baidu_url = "https://www.baidu.com/s?wd="+key_word+"+site%3A"+url+"&pn="+str((page-1)*10)
conn = requests.session()
conn.get(referer,headers=headers(referer))
r = conn.get(baidu_url, headers=headers(referer))
soup = BeautifulSoup(r.text, 'lxml')
tagh3 = soup.find_all('h3')
for h3 in tagh3:
href = h3.find('a').get('href')
try:
r = requests.get(href, headers=headers(referer),timeout=8)
emails = search_email(r.text)
except Exception as e:
pass
for email in emails:
email_list.append(email)
return email_list
def search_email(html):
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+",html,re.I)
return emails
def headers(referer):
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
'Accept': '*/*',
'Accept-Language':'en-US,en;q=0.5',
'Accept-Encoding':'gzip,deflate',
'Referer':referer}
return headers
if __name__ == '__main__':
#定义异常
try:
start(sys.argv[1:])
except KeyboardInterrupt:
print("interrupted by user,killing all threads...")
Code rendering