Information collection for web security attack and defense


The collection of information adopts an outside-to-inside collection method to ensure that the collection of information is as comprehensive as possible. In the process of penetration testing, we test from all angles, and we need to find out the vulnerabilities in the website as much as possible to write a test report, so as long as it is related to the target website, we must collect as much as possible.

We get the address of a website, take https://www.baidu.com as an example.

  • Intuitive information: Use Google search syntax to search for website information, you may be able to find the background or find the injection point.
  • Application layer: We need to collect the domain name information of the website. Who registered this website? What is the mailbox number of this person? (Can be used for password blasting, etc.) How many subdomains does the current domain name have?
  • Transport layer: We need to know that UDP and TCP at the transport layer usually have no information. They are usually used to ensure network quality and other functions, so we do not do anything at this layer.
  • Network layer: What IPs does the target website have? What services are available on those IPs? Which ports are open to the outside world on those IPs? Domain names in the real world are likely to use CDN technology, and CDN only usually does not have the information we want. Our goal is the server host of the website, so we need to find it.
  • Data link layer: There is no critical information either.

There is also a more special and commonly used method, fingerprint recognition.

0x01 visual information

Google hacker often said in the circle (escape
common syntax:

  • site: Specify the domain name. site:edu.cn (all edu.cn websites)
  • inurl: There are keywords in the URL. inurl:?id= (SQL injection and other functions)
  • intext: There are keywords in the body of the webpage. intext: background (look for background)
  • filetype: Specify the file type. filetype:pdf (look for e-books)
  • link: The web page associated with the current link. link:slug01sh.github.com (I made a blog with my blog)
  • cache: Cache in Google

You can also perform a combined search. site:edu.cn inurl:?id=

Similarly, you can search in other search engines

  • Baidu
  • Yahoo
  • Bing
  • Shodan
  • Github

0x02 application layer

Q: Who registered this website? What's his mailbox?

Collect domain name registrations. Domain name service provider, registrant information (name, email information, address), IP address, record information query

  1. whois query
  2. Webmaster's home
  3. Love Station Tool Network
  4. VirusTotal
  5. ICP record query network
  6. Sky Eye Check

Usually the main domain name will set up some protection, and some newly opened new subdomains have not had time to protect, you can consider attacking from the side. Before that, we'd better collect the information of the subdomain first. Collection method:

  • Search engine enumeration. site:baidu.com, simple violence
  • third-party usage. DNSdumpster, online DNS investigation, subdomain blasting website (https://phpinfo.me/domin), IP reverse check binding domain name website (http://dns.aizhan.com)
  • Log transparency public log enumeration. https://crt.sh and https://censys.io
  • Subdomain scanner. Such as: Layer subdomain name digger, subDomainBrute, Sublist3r

0x03 network layer

Q: How to judge whether to adopt CDN?

Online website using multi-location ping service (https://www.17ce.com)

Q: If I use a CDN, how should I obtain the real IP?

  • Visit the current domain name from abroad (CDN charges different at home and abroad)
  • View the resolution records of the domain name (there may be records of IP in the past)
  • Scan website test files. Such as phpinfo, test (the test file is not linked to CDN)
  • Ping substation (the substation does not have a CDN)
  • Packet capture analysis (data interaction will interact with the real server)
  • Internal mailbox traceability. Mail server IP, ping mail server domain name (website’s own mail server)

Q: How to verify the authenticity of the IP found?

Test whether the "port 80 website" is the same as the "website accessed by your domain name".

Q: What to do after finding the IP?

  • Port information
  • Web path information

Q: What are the common tools for collecting port information?

  • nmap
  • masscan
  • map
  • Yujian high-speed TCP port scanning tool

TCP/UDP port list: https://zh.wikipedia.org/wiki/TCP/UDP%E7%AB%AF%E5%8F%A3%E5%88%97%E8%A1%A8

Common URL: http://localhost:4000/article. The domain name and port information in the URL has been collected. After determining the domain name and port to be attacked, if you want to attack the Web service, you need to collect the Web directory file first.

Q: How to collect web catalog files?

Use scanner to scan, commonly used tool: dirsearch.

0x04 fingerprint recognition (blog category)

CMS is used to manage the articles of the website, and there are usually some frameworks to apply. These frameworks may have been discovered for various reasons, and the administrators have not updated them in time. You can use online tools for identification, or you can use local identification.
Commonly used local tools are:

  • Yujian Web Fingerprint Recognition
  • WhatWeb
  • WebRobo
  • Coconut tree
  • Lightweight Web fingerprint recognition
    Common online tools:
  • BugScaner
  • Yunxi fingerprint
  • WhatWeb

Guess you like

Origin blog.csdn.net/qq_43085611/article/details/113115491