Passive information gathering

introduction

Passive information collection refers to not directly interacting with the target host, and usually obtains the target host's information indirectly based on search engines or social networking methods.

Passive information collection mainly extracts target asset information through search engines or social networking and other methods, usually including IP query, Whois query, subdomain collection, etc. When collecting passive information, there is no interaction with the target, and the target information can be mined without touching the target system. The main methods include DNS resolution, subdomain mining, email crawling, etc.

IP query

IP query is the process of querying the corresponding IP address through the currently obtained URL
Insert picture description here

Whois query

Whois is a transmission protocol used to query the IP and owner information of a domain name. Simply put, it is a database to query whether the domain name has been registered and the detailed information of the registered domain name, such as the domain name owner, domain name registrar, etc.
Insert picture description here
Insert picture description here

Subdomain mining

Domain names can be divided into top-level domain names, first-level domain names, and second-level domain names. In the testing process, if no related vulnerabilities are found when testing the target master station, it is usually considered to mine the subdomain name of the target system at this time. There are many methods of subdomain mining, such as search engine, subdomain cracking, dictionary query, etc.

Among the tools currently in use, my favorite is the Layer subdomain excavator
Insert picture description here

Mail scraping

In the process of infiltrating the target system, if the security of the target server is very high and it is difficult to obtain the target permission through the server, the social engineering method is usually used to further attack the target server. Email phishing is one of the common attack methods. After crawling and processing the relevant email information on the search page, the obtained email account is used to send phishing emails in batches to trick or defraud users or administrators to log in or click to execute the account, thereby obtaining system permissions.

Related function libraries to be used

import sys
import getopt
import requests
from bs4 import BeautifulSoup
import re

The main program entry, sys.argv[0] represents the path of the code itself, sys.argv[1:] represents all the following parameters except the path, and returns a list form

if __name__ == '__main__':
    #定义异常
    try:
        start(sys.argv[1:])
    except KeyboardInterrupt:
        print("interrupted by user,killing all threads...")

Main program in start function, where the most important to understand getopt.getopt () function, the function of this role is mainly cutting command line parameters and returns a two-tuple list, respectively '-'and '--'cutting

#主函数,传入用户输入的参数
def start(argv):
    url = ""
    pages = ""
    if len(sys.argv) < 2:
        print("-h 帮助信息;\n")
        sys.exit()
    #定义异常处理
    try:
        banner()
        opts,args = getopt.getopt(argv,"-u:-p:-h")
    except getopt.GetoptError:
        print('Error an argument!')
        sys.exit()
    for opt,arg in opts:
        if opt == "-u":
            url = arg
        elif opt == "-p":
            pages = arg
        elif opt == "-h":
            print(usage())

    launcher(url,pages)

opts is the analyzed format information. args are the remaining command-line parameters that are not format information, that is, information other than the long or short option characters and additional parameters defined in getopt(). The opt in the for loop is the first parameter in the tuple (that is, the parameter entered by the user), and arg is the second parameter of the tuple (that is, the parameter value entered by the user).

Output help information to increase the readability and ease of use of code tools

#banner信息
def banner():
    print('\033[1;34m########################################################################################\033[0m\n'
          '\033[1;34m######################################\033[1;32mpython安全实战\033[1;34m#####################################\033[0m\n'
          '\033[1;34m########################################################################################\033[0m\n')

#使用规则
def usage():
    print('-h: --help 帮助;')
    print('-u: --url 域名;')
    print('-p: --pages 页数;')
    print('eg: python -u "www.baidu.com" -p 100' + '\n')
    sys.exit()

Next is the main body of the function, which is also the core part. First, write the vulnerability callback function to enhance the robustness and scalability of the code

#漏洞回调函数
def launcher(url,pages):
    email_num = []
    key_words = ['email','mail','mailbox','邮件','邮箱','postbox']
    for page in range(1,int(pages)+1):
        for key_word in key_words:
            bing_emails = bing_search(url,page,key_word)
            baidu_emails = baidu_search(url,page,key_word)
            sum_emails = bing_emails + baidu_emails
            for email in sum_emails:
                if email in email_num:
                    pass
                else:
                    print(email)
                    with open('data.txt','a+') as f:
                        f.write(email + '\n')
                    email_num.append(email)

Then write two crawler functions for crawling emails, because Bing and Baidu both have anti-climbing protection, you can bypass the anti-climbing protection of search engines by limiting referer, cookie and other information

#bingSearch
def bing_search(url,page,key_word):
    referer = "http://cn.bing.com/search?q=email+site%3abaidu.com&qs=n&sp=-1&pq=emailsite%3abaidu.com&first=1&FORM=PERE1"
    conn = requests.session()
    bing_url = "https://cn.bing.com/search?q="+key_word+"site%3a"+url+"&qs=n&sp=-1&pq="+key_word+"site%3a"+url+"&first="+str((page-1)*10)+"&FORM=PERE1"
    conn.get('http://cn.bing.com',headers=headers(referer))
    r = conn.get(bing_url,stream=True,headers=headers(referer),timeout=8)
    emails = search_email(r.text)
    return emails

#baiduSearch
def baidu_search(url,page,key_word):
    email_list = []
    emails = []
    referer = "https://www.baidu.com/s?wd=email+site%3Abaidu.com&pn=1"
    baidu_url = "https://www.baidu.com/s?wd="+key_word+"+site%3A"+url+"&pn="+str((page-1)*10)
    conn = requests.session()
    conn.get(referer,headers=headers(referer))
    r = conn.get(baidu_url, headers=headers(referer))
    soup = BeautifulSoup(r.text, 'lxml')
    tagh3 = soup.find_all('h3')
    for h3 in tagh3:
        href = h3.find('a').get('href')
        try:
            r = requests.get(href, headers=headers(referer),timeout=8)
            emails = search_email(r.text)
        except Exception as e:
            pass
        for email in emails:
            email_list.append(email)
    return email_list

Obtain the mailbox number through regular expressions, and attach optional flag modifiers to control the matching pattern.

def search_email(html):
    emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+",html,re.I)
    return emails

def headers(referer):
    headers = {
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
               'Accept': '*/*',
               'Accept-Language':'en-US,en;q=0.5',
               'Accept-Encoding':'gzip,deflate',
               'Referer':referer}
    return headers
Modifier description
re.I Make the match case insensitive
re.L Do locale-aware matching
re.M Multi-line matching, affects ^ and $
re.S Make. Match all characters including newline
re.U Analyze characters according to the Unicode character set. This flag affects \w, \W, \b, \B.
re.X This flag allows you to write regular expressions easier to understand by giving you more flexible formatting.

Finally, attach the complete code

import sys
import getopt
import requests
from bs4 import BeautifulSoup
import re

#主函数,传入用户输入的参数
def start(argv):
    url = ""
    pages = ""
    if len(sys.argv) < 2:
        print("-h 帮助信息;\n")
        sys.exit()
    #定义异常处理
    try:
        banner()
        opts,args = getopt.getopt(argv,"-u:-p:-h")
    except getopt.GetoptError:
        print('Error an argument!')
        sys.exit()
    for opt,arg in opts:
        if opt == "-u":
            url = arg
        elif opt == "-p":
            pages = arg
        elif opt == "-h":
            print(usage())

    launcher(url,pages)

#banner信息
def banner():
    print('\033[1;34m########################################################################################\033[0m\n'
          '\033[1;34m######################################\033[1;32mpython安全实战\033[1;34m#####################################\033[0m\n'
          '\033[1;34m########################################################################################\033[0m\n')

#使用规则
def usage():
    print('-h: --help 帮助;')
    print('-u: --url 域名;')
    print('-p: --pages 页数;')
    print('eg: python -u "www.baidu.com" -p 100' + '\n')
    sys.exit()

#漏洞回调函数
def launcher(url,pages):
    email_num = []
    key_words = ['email','mail','mailbox','邮件','邮箱','postbox']
    for page in range(1,int(pages)+1):
        for key_word in key_words:
            bing_emails = bing_search(url,page,key_word)
            baidu_emails = baidu_search(url,page,key_word)
            sum_emails = bing_emails + baidu_emails
            for email in sum_emails:
                if email in email_num:
                    pass
                else:
                    print(email)
                    with open('data.txt','a+') as f:
                        f.write(email + '\n')
                    email_num.append(email)

#bingSearch
def bing_search(url,page,key_word):
    referer = "http://cn.bing.com/search?q=email+site%3abaidu.com&qs=n&sp=-1&pq=emailsite%3abaidu.com&first=1&FORM=PERE1"
    conn = requests.session()
    bing_url = "https://cn.bing.com/search?q="+key_word+"site%3a"+url+"&qs=n&sp=-1&pq="+key_word+"site%3a"+url+"&first="+str((page-1)*10)+"&FORM=PERE1"
    conn.get('http://cn.bing.com',headers=headers(referer))
    r = conn.get(bing_url,stream=True,headers=headers(referer),timeout=8)
    emails = search_email(r.text)
    return emails

#baiduSearch
def baidu_search(url,page,key_word):
    email_list = []
    emails = []
    referer = "https://www.baidu.com/s?wd=email+site%3Abaidu.com&pn=1"
    baidu_url = "https://www.baidu.com/s?wd="+key_word+"+site%3A"+url+"&pn="+str((page-1)*10)
    conn = requests.session()
    conn.get(referer,headers=headers(referer))
    r = conn.get(baidu_url, headers=headers(referer))
    soup = BeautifulSoup(r.text, 'lxml')
    tagh3 = soup.find_all('h3')
    for h3 in tagh3:
        href = h3.find('a').get('href')
        try:
            r = requests.get(href, headers=headers(referer),timeout=8)
            emails = search_email(r.text)
        except Exception as e:
            pass
        for email in emails:
            email_list.append(email)
    return email_list

def search_email(html):
    emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+",html,re.I)
    return emails

def headers(referer):
    headers = {
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
               'Accept': '*/*',
               'Accept-Language':'en-US,en;q=0.5',
               'Accept-Encoding':'gzip,deflate',
               'Referer':referer}
    return headers

if __name__ == '__main__':
    #定义异常
    try:
        start(sys.argv[1:])
    except KeyboardInterrupt:
        print("interrupted by user,killing all threads...")

Code rendering
Insert picture description here

Summary: Because the collection of information involves a lot of basic knowledge of crawlers, the foundation of Python crawlers must be solid, and the above code is not static, because the URL of the website may change.

Guess you like

Origin blog.csdn.net/weixin_45007073/article/details/113136441