The latest and most complete collection of web reconnaissance information collection tools in 2023

Introduction

Project address: https://github.com/killmonday/whatscan

In the process of information collection, we often encounter the following problems:

  1. Among the web assets detected by cyberspace search engines and many scanning tools such as kscan and fscan, there are a large number of pages with status codes 302 and 200 that have no titles and cannot be identified. Some pages really have no titles. Title, but for more pages, the scanner does not follow the jump and cannot get the final page, so the title is empty. The reason why the final page cannot be obtained is that either the status code is not followed, or the status code is 200 but a jump is made in the page js. Because the scanner cannot render the js, it cannot be followed. For this reason, many assets are unclear about what they are, let alone their properties.

  2. Due to the language problem of the task target, its nature is often not clear from the page title. Even if the page is opened manually, it needs to be translated. It is difficult to judge whether a website belongs to the target.

  3. I have obtained a large number of target domain names and IPs, and want to detect the target C segment to find out if there are any target assets. However, even if the web assets are detected, I still face the above problems 1 and 2. Overall, it is a laborious task.

In order to solve the above problems, the author developed whatscan, which can perform web scanning, screenshots, title translation, high-frequency word recognition, and web component recognition of batch URLs, and output them as excel and word documents, which can be reviewed, organized, and written reports.

Among them, the web component identification uses 2.4w web fingerprints of kscan. The kscan.dll in the directory is modified based on the kscan source code and adds the go method for calling local fingerprint identification (using cgo). It is compiled into a dll and then called in python. The exported function in the dll directly performs fingerprint recognition without unnecessary secondary detection. The fingerprint of kscan is in static\fingerprint.txt in the directory. It can be customized and modified and new fingerprints added, so it has scalability (thanks to kscan ).

Overall, the functions of whatscan are as follows:

  • Web asset identification (CMS/application components/containers/programming languages ​​and other information)

  • Supports taking screenshots of web sites

  • Supports browser simulation to solve the shortcoming of ordinary crawlers that cannot render js. It can identify js jumps and get the real page and title.

  • Support title translation (call Google Translate, so you need to bypass the wall)

  • Supports extracting high-frequency words from pages and translating them

  • Support IP domain name analysis

  • Supports exporting word documents and excel

  • It is suitable to detect and sort out the assets in multiple C segments after extracting the C segment from the core assets to see what exists and whether there are any assets that need attention.

use

This project uses selenium to borrow chromedriver for headless browser access. Therefore, you need to install the chrome browser and visit https://chromedriver.chromium.org/downloads** to download the chromedriver.exe that matches the current chrome version and place it in the project path**. If the version of chromedriver.exe does not comply, a full-screen exception may be reported as soon as it is run. It may appear to be running, but in fact it is useless. You still need to change to the correct version.

Before running, modify config.ini in the current directory and configure it according to your needs:

[set]``#浏览器线程,建议不超过50,除非cpu和网络好``browser_thread=20``#输入文件名。文件内容每行一个url,放在input文件夹下``input_file=input.txt``#从输入文件的哪一行开始探测``read_index=1``   ``#探测时是否使用socks5代理``use_proxy=1``#访问谷歌翻译api是否需要使用代理``translate_using_proxy=1``   ``#是否需要谷歌翻译来翻译标题和高频词``need_tanslate=1``#是否需要开启高频词分析``need_word_freq=1``   ``#谷歌翻译api请求的超时时间``google_tran_api_timeout=30``#页面加载最长时间``set_page_load_timeout=30``   ``#代理服务器IP``proxy_server=127.0.0.1``#代理服务器端口``proxy_port=10809``   ``   ``   ``q_input_length=50``q_output_length=50


Install dependencies first when running: pip install -r requirements.txt

Then you can execute it directly: python whatscan.py

The output word and excel are saved under output/<timestamp>, as shown below.

Insert image description here

When the program is running, do not open word files under xlsx or tmp, so as to avoid being unable to read documents occupied by office when merging documents at the end.

If ctrl+c ends the program while it is running, the chrome process may be unexpectedly retained. You can execute kill-chrome.bat in the project directory to close all chrome and chromedriver processes on the machine.

Use display

excel

**Insert image description here
**

word

Among them, the information corresponding to "product" is the fingerprint recognition result of kscan.

Insert image description here

Insert image description here

Insert image description here

Insert image description here

Insert image description here

Download link

likeeffectyouyathought :Black guest & security

Get it here:

Insert image description here

I spent several days and nights organizing this myselfThe latest and most complete network security learning material packageI share it with you for free, which includes The following:

1. Learning route & career planning

Insert image description here
Insert image description here

2. A full set of system courses & entry to mastery

Insert image description here

3. Hacker e-books & interview materials

Insert image description here

Guess you like

Origin blog.csdn.net/shangguanliubei/article/details/134962916