Summary of Information Collection

About penetration testing

What is penetration testing?

In fact, penetration testing means that security practitioners or penetration testing engineers use their knowledge of penetration to simulate hackers' attack methods and behaviors to penetrate the target step by step, and discover the vulnerabilities and hidden risks of the target. .

After the penetration test, a professional test report is formed and provided to the customer. Then the client will patch his website and server vulnerabilities based on this professional penetration test report to prevent hackers from intruding.

The prerequisite for penetration testing is that the user's authorization (preferably written authorization) must be obtained before the target site can be penetrated. The work related to penetration testing of a website without authorization is a very serious crime (in real life, many people, technical teams, and security companies have stepped on this big pit ) . On June 1, 2017, my country has promulgated the "Network Security Law of the People's Republic of China", which has further legal restrictions on the conduct of cybercrime. For details, please visit Security Act People's Republic of China

White box and black box of penetration testing

Like traditional software testing, penetration testing is also divided into "white box testing" and "black box testing"

White box testing is to infiltrate the target site's source code and some important information, similar to the code analysis post but different from the logic coverage, loop coverage and basic path testing of traditional software testing. White-box testing of penetration testing is to obtain various information from the target unit through normal channels, including network topology, employee information, and even code snippets of websites or other programs, when authorized, and it can also communicate with other employees of the unit. (Sales, product managers, programmers, managers...) to communicate face-to-face. The purpose of this type of test is mainly to simulate the unauthorized operation of the internal staff of the enterprise.

The black box test only tells us the url or domain of the website, without telling anything else, and then simulates the hacker's penetration of the website. The infiltrator is completely in a state of ignorance of the system. Usually this type of test, the initial information Obtain from DNS, Web, Email and various public servers.

In addition to the white box and the black box, there is also a test method, which is generally called "secret test". Covert testing is often for the unit under test. Normally, the network management department of the unit that accepts the penetration test will receive a notification: the test will be carried out in certain specific time periods. Therefore, it is possible to monitor changes occurring in the network. However, only a few people in the tested units that conduct covert tests are aware of the existence of penetration testing, so they can effectively check whether the unit's information security incident monitoring, response, and recovery are in place.

Next, let's take a look at the process and ideas of black box penetration testing!

Insert picture description here

It can be seen that when the infiltrator determines the target, the first thing they do is to collect information. The so-called knowing oneself, knowing the enemy is victorious, penetration testers can often find out a series of information about the website based on the URL and domain of the website. Through the URL and domain, we can find a series of information such as the IP of the website, the operating system of the website, the scripting language, whether there are other websites on the server, and so on.

In the previous article, we mentioned that information collection is divided into passive information collection and active information collection. Information collection is a very important step for penetration. The more detailed information collected, the greater the impact on future penetration testing. The collection of information determines the success of penetration. Passive information collection does not directly interact with the target server, and collects information on the periphery of the target through search engines, social media, etc., without being noticed by the target system. Active information collection is opposite to passive information collection. Active collection will directly interact with the target system to obtain some intelligence information related to the target system.

No one method is perfect. Each method has its own advantages. The active method allows you to obtain more information, but the target host may record your operation records. In the passive way, you will collect relatively little information, but your actions will not be discovered by the target host. Generally, in a penetration project, it may be necessary to collect multiple information at different stages, and at the same time, use different collection methods to ensure the integrity of information collection.

In general: active information collection generates interaction and leaves a record; passive information collection does not generate interaction.

And what information should be collected for information collection? Next, let's take a look at some common information that needs to be collected in penetration testing.

Domain name information collection

After determining the target to be penetrated, that is, knowing the target domain name, then we need to collect a series of domain-related information such as the real IP corresponding to the domain name, whois information, and subdomain name.

The Domain Name System (English: Domain Name System, abbreviation: DNS) is a service of the Internet. As a distributed database that maps domain names and IP addresses to each other, it can make it easier for people to access the Internet. Simply put, it is a system that translates a domain name into a machine-recognizable IP address.

Domain name resolution

For example, as a domain name, baidu.com corresponds to the IP address 180.101.49.12. When we enter 180.101.49.12 in the URL address bar of the browser, we can also access the Baidu search engine. After we directly call the name of the website, DNS will convert the name that is easy for us to use by humans like baidu.com into an IP address that is easy for machine identification like 180.101.49.12. Another example is the domain name of 163.com, news.163.com is the subdomain corresponding to 163.com, and news is the corresponding hostname under www.

Determine the real IP address corresponding to the domain name

Before we collect the real IP address corresponding to the domain name, we first need to check whether the domain name has a CDN. We can go to the online CDN to query the website: Ping server in multiple locations, website speed test-webmaster tool . In general, if the number of IPs found out is greater than one, it means that the IP address is not a real server address. If there are 2 or more, and these addresses are different operators in the same area, it is very likely that these addresses are the server's exit address, and the server is mapped through the NAT of different operators in the internal network. For Internet access, several different operators can be used for load balancing and hot backup at the same time. If there are multiple ip addresses, and these ip addresses are distributed in different regions, it can basically be concluded that CDN is used. So how to find the real IP address of a website by bypassing the CDN? Please check how to bypass the CDN to query the real IP of a website

whois query

Whois is a transmission protocol used to query the IP and owner of a domain name. Simply put, whois is a database used to query whether a domain name has been registered, and the trusted database of registered domain names (such as domain name owners, domain name registrars).

The whois information of different domain name suffixes needs to be queried in different whois databases, such as the whois database of ".com" and the difference of ".edu". The whois information of each domain name or IP is saved by the corresponding management organization. For example, the whois information of the domain name ending in ".com" is managed by the ".com" operator VeriSign, and the Chinese top-level domain name ".cn" is managed by CNNIC (China Internet Information Center) management.

In the early days, whois queries mostly existed with command-line interfaces, but now there are some online query tools with simplified web interfaces that can query different databases at once. The query tools of the web interface still rely on the whois protocol to send query requests to the server, and the tools of the command line interface are still widely used by system administrators.

The basic content of the whois protocol is: first establish a connection to port 43 of the server's TCP protocol, send query keywords and add carriage returns and line feeds, and then receive the server's query results.

Under normal circumstances, we carry out whois query is to go to: webmaster's home whois query . Then, after finding out the information, you can perform a reverse check based on the mailbox, registrant, company, telephone, etc. that are inquired out.
In general, domain name registrants for small and medium-sized websites are webmasters, who use search engines to search for the information found by whois to obtain more personal information of domain name registrants.

whois query method

Common websites for web interface query include:
Whois webmaster’s home query: http://whois.chinaz.com/Alibaba
Cloud China Wanwang query: https://whois.aliyun.com/
Whois Lookup Find the owner of the target website Information: http://whois.domaintools.com/Global
Whois query: https://www.whois365.com/cn/Webmaster
Tools love station query: https://whois.aizhan.com/

Query by whois command

In the whois query tool that comes with Kali Linux, you can query the domain name information through the whois command. (Take Haoche.com as an example)

whois haoche.cn

Insert picture description here

Example:
The following is the use of the webmaster’s whois to query the relevant information of the author’s scxqn.com site (www.scxqn.com). You can see that the website title is "Heart Youth (Sichuan) Alumni Association-Dimension Cube Education in order to be unable to measure Awareness!", website registrar, contact person, expiration time and other information.

Record information

ICP filing refers to Internet Content Providers. The "Internet Information Service Management Measures" point out that the website needs to be filed, and Internet information services are not allowed without permission.

Love got Tool Network https://whois.aizhan.com
home owners HTTPS: whois.chinaz.com
Virus Total https://www.virustotal.com
ICP Record query network https://www.beianbeian.com
day eye check https://www.tianyancha.com
China Internet Information Center
www.cnnic.com.cn Filing Baba www.beian88.com

Collection of subdomain information

Subdomains, that is, second-level domains (third and fourth-level subdomains also exist), refer to top-level domains and the next-level domains under the parent domain name. For example, the two domain names news.baidu.com and tieba.baidu.com are subdomains under baidu.com.
Assuming that the network scale of the target site to be infiltrated is relatively large, the enterprise has a relatively sound application of protection measures for the domain name of the main site, and the protection of the main site itself, vulnerability discovery, emergency response, vulnerability repair, or hardware security equipment can be timely Protection in place.
At this time, it is obviously an irrational behavior to start from the main domain name, so it is better to find a breakthrough from a certain subdomain name of the infiltration target. Information can be collected for the subdomain name, and then the loopholes in the subdomain name can be exploited for roundabout attacks.

Commonly used subdomain enumeration detection tools

Commonly used tools for subdomain detection mostly collect information on subdomains through enumeration. Enumeration requires a good dictionary, and a good dictionary can increase the success rate of enumeration.

Commonly used subdomain enumeration blasting tools mainly include Layer subdomain name excavator, K8, Sublist3r, subDomainBrute, etc.

Here we briefly introduce the Layer subdomain name excavator and subDomainBrute.

Using Layer Subdomain Excavator
"Layer Subdomain Excavator" is a graphical tool with many built-in subdomain dictionaries, supports multi-threading, can identify the real IP of a domain name, and is one of the commonly used subdomain blasting tools.

The "Layer subdomain name excavator" is relatively simple to use. You can directly enter the domain name in the domain name dialog box to scan. At the same time, it supports three modes of service interface, brute force search, and same server mining. It supports opening website, copying domain name, copying IP, copying CDN, exporting domain name, exporting IP, exporting domain name+IP, exporting domain name+IP+WEB server, and exporting survival website ! The scanned interface is also more detailed. There are information such as domain name, resolution IP, CDN list, WEB server and website status.

There are currently 5.0 update version and 4.X commemorative version, the main difference is that 5.0 adds multi-layer subdomain traversal function.

Insert picture description here

Use subDomainBrute The
"subDomainBrute" tool is also a domain name collection tool for infiltration targets. The number of high-concurrency brute-force enumeration requests can reach 1,000 times per second. The method of use is:

python subDomainBrute.py xxx.com

The domain names found by the enumeration will be saved in the "xxx.com.txt" file. Portal: Information collection of subdomains

Collection of sensitive company information online

When the target is determined, we can go to the Internet public resource information (including but not limited to social networking sites, recruitment sites, search engines, etc.) to query such as the company’s email format, the company’s employee names, contact information, and related to the company Any information.

Generally, companies will put the official contact information on the official website, and penetration testers can collect email and telephone information through the relevant contact information.
There are two main functions by collecting mailbox information:

  • By discovering the naming law of the target system account, it can be used to log in to other subsystems later.
  • Used for blasting login mailbox.

After we collect a few mailboxes, we will roughly guess the naming rule of the other party's mailbox. In addition to employee mailboxes, companies usually have some common mailboxes, such as human resources mailboxes and customer service mailboxes. This kind of mailbox sometimes has a weak password, so you can pay extra attention to it when infiltrating. There are often unexpected gains.

Digging by hand and tools, viewing the web container or web page source code, there may be sensitive information. For example, when accessing the directory under the url, the file list under the directory is directly listed, and the wrong error message contains the information of the website.
Tools like crawlers can scan sensitive file paths to find sensitive data. For example, files such as "robot.txt", "test.php", and "info.php".

Moreover, we can also go to Github, Code Cloud and other code hosting platforms to find sensitive information related to this. Some careless programmers did not desensitize the code after uploading the code to the code hosting platform. The uploaded code contains information such as database connection information, email password, and possibly leaked source code.
Portal——> Company Sensitive Information Collection-Git Information Leakage Vulnerability and GitHack Usage Method

Fingerprint recognition

CMS (Content Management System), in order to improve development efficiency, enterprises or developers will use the existing CMS system, which will carry out secondary development on the existing basis. The wide variety of CMS, coupled with the uneven development level and technology of developers, caused CMS vulnerabilities to once become the hardest hit area for WEB penetration. If the penetration target uses CMS as the application template of the content management system, CMS identification is performed through information collection, and after obtaining the fingerprint identification version information of the CMS, the penetration attack can be carried out through the existing vulnerabilities of the CMS.
Common CMS includes Discuz, Dedecms (dream weaving), PhpCMS, WordPress, etc.

Online fingerprint recognition website:

Common website fingerprint recognition tools:

  • Whatweb
  • Wappalyzer (can be run as a plug-in in the browser)

Server information collection

The information that the server needs to collect mainly includes three aspects: port information, program service version identification, and operating system information identification.

Port information collection and program service version identification are mainly about which ports are opened for Le Collection Server, what services are running on these ports, and the version information of these services. The vulnerabilities of different services are different, and the vulnerabilities of different versions of the same service are also very different. Therefore, the specific version information of each service can be identified to be able to exploit the vulnerabilities of related versions.

Operating system information identification is to determine what type of operating system the target server is running, and exploit vulnerabilities based on different types of operating systems and operating system vulnerabilities of different versions. For example, WIN system may have "eternal blue loopholes", Linux system may have "dirty cow loopholes".

Port scan

Tools can be used to collect the target and its port status. Its working principle is to use protocols such as TCP or UDP to send data packets such as designated flag bits to the target port and wait for the target to return the data packet to determine the port status.

Collect port information through the tools Nmap and masscan, including:

Use Nmap to collect, the command is: nmap -A -v -T4 target

Use masscan to collect, the command is: masscan -p80 target

The use of tools usually leaves traces on the target website, and then an online website detection method is provided.

Port attack

There are different attack methods for different ports, because each port is the gate of the recording server or the target system. As long as the gate is opened, the target system can be entered. For example, the attack method against the remote connection service port is as follows, because the Telnet remote connection on port 23 transmits information in plain text, and attacks can be carried out through methods such as blasting, sniffing, and weak passwords.
Insert picture description here

Summary of commonly used ports in the penetration testing process -> Portal

Defensive measures

For port attacks, as long as the port is open and can be connected, you can use the corresponding method to test the attack. The defense measures provided here include but are not limited to:

  • Close unnecessary ports;
  • Set up firewalls for service ports of important businesses;
  • Strengthen employees’ awareness of information security, frequently change user passwords, and avoid blasting weak passwords;
  • Update the software frequently and apply patches (Patch);
  • Use the vulnerability information of the CVE website to improve the security of your own website.

Server version identification

The server information includes the operating system used by the server: Linux or Windows. At present, more than 90% of the operating system of the corporate website server uses the Linux operating system. After knowing the operating system of the server, you also need to know the specific version of the operating system. Because many low-version operating systems have known vulnerabilities.

The easiest way to determine whether it is Linux or Windows is to detect it through ping. The TTL value of Windows is generally 128, and Linux is 64. So the ones with more than 100 must be Windows, and those with dozens of them must be Linux. However, it is not 100% accurate to judge the server type by TTL value. The TTL value of some windows servers is also dozens, and some servers prohibit ping.

To determine the specific version of the target website server, you can use nmap to scan, and both -O and -A parameters can be scanned.

Website sensitive directories and files

Scan the website directory structure to see if the directory can be traversed, or if sensitive files are leaked

Sensitive path detection mainly uses tools to detect, and the more commonly used ones are Yujian, BurpSuite, wwwscan, Webdirscan, etc.

  • Backstage catalog: weak password, universal password, blasting
  • Installation package: get database information, even website source code
  • Upload directory: cut off, upload pictures, etc.
  • mysql management interface: weak passwords, blasting, universal passwords, then take off your pants, or even get a shell
  • Installation page: It can be bypassed by second installation
  • phpinfo: will expose all kinds of information you configure
  • Editor: fck, ke, etc.
  • iis short file utilization:
    *robots.txt files such as windows, apache, etc. are more demanding (anti-gentleman but not villainous)

The robots.txt file is a plain text file specially written for the search engine robot robot. We can specify the directories in the website that we don't want to be visited by robots in this file. In this way, part or all of the content of our website can be excluded from search engines, or search engines can only include specified content. Therefore, we can
use robots.txt to prevent Google's robots from accessing important files on our website, and the threat of GoogleHack will no longer exist.
Suppose the content of the robots.txt file is as follows:

User-agent: *
Disallow: /data/
Disallow: /db/
Disallow: /admin/
Disallow: /manager/
Allow:/images/

BurpSuite Intruder module

Use BurpSuite to scan the directory, set the path of the captured data packet as a variable through the Intruder module, add the dictionary of the directory file as the payload, and then continue to traverse to achieve the purpose of directory blasting.
BurpSuite scans the web directory

Webdirscan

Instructions:

python2 webdirscan.py -o test.txt -t 10 http://www.xxx.com       # -o 指定输出的文件,-t 指定线程数

Side station and C-segment scan

Sidesites refer to different websites on the same server as the attack target. If no vulnerabilities are found in the infiltrated target, the vulnerabilities of the sidesites can be found. You can first take down the webshell of other websites, and then raise the authority to get the server's authority, and finally you can take down the website naturally!

Segment C refers to other servers in the same internal network segment. Each IP has four segments ABCD. For example, 192.168.0.1, segment A is 192, segment B is 168, segment C is 0, and segment D is 1. , And C segment sniffing means to take down one of its servers in the same C segment, that is to say, it is a server in D segment 1-255, and then use tool sniffing to take down the server.

Online query address of side station and section C:

Guess you like

Origin blog.csdn.net/weixin_42250835/article/details/111566350