Robots.txt leaks sensitive information

Others 2021-01-23 16:25:18 views: null

Robots.txt leaks sensitive information

What is Robots?

Robots is an agreement between a website and a crawler. The website uses the robots protocol (robots.txt) to tell search engines which pages can be crawled.

When a search spider visits a website, it will first check whether robots.txt exists in the site and directory, and then determine the scope of access according to the content regulations in the file.

What is the reason why Robots.txt leaks sensitive information?

The robots.txt file itself has no loopholes. It tells search engine spiders which files can be crawled and which cannot be crawled. When we generally write the robots.txt file, in order to prevent the crawling of search engine spiders, we will write the path. However, robots.txt mostly defines the backend address or database address of the website, which may reveal sensitive information.

Ways to scan robots for vulnerabilities:

You can use the tool crawler to scan the website's sensitive file directory and crawl the robots file. Or add /robots.txt directly after the url link for testing.

How to fix it?

First of all, we must make it clear that robots.txt should not be used to protect/hide information. Sensitive files and directories should be moved to another isolated subdirectory to exclude this directory from Robots searches.
The content of robots.txt can be set to Disallow: / to prohibit search engines from accessing any content of the website.

You can also find robots generators from the Internet and generate robots according to your requirements for research.

I am also groping slowly in a safe direction. These are all my own notes. I welcome your comments and criticisms.

Guess you like

Origin blog.csdn.net/zHx981/article/details/112181140

Robots.txt leaks sensitive information

Path and sensitive information discovery

【js】Desensitization of sensitive information

Examples of robots.txt

Tools - masking sensitive information rules

Transmit sensitive information in clear text

Regular expression to match sensitive information

Vulnerability information collection-sensitive information collection

robots.txt file example

Reptile's robots.txt

GitLab permanently remove sensitive information or large files

Hide Apache version number and other sensitive information

SpringBoot sensitive information encrypted configuration file

WeChat Mini Program - Obtaining User Sensitive Information

Introduce two App sensitive information collection tools

jasypt-spring-boot encrypts sensitive information

GitHub Sensitive Information Leakage Monitoring Tool

How to protect sensitive information in SpringBoot configuration files

pikachu shooting range - Sensitive information leaked

Forbidden nginx robots.txt at setting method

Google open source robots.txt parser

Static resource CDN robots.txt settings

[CTF knowledge base] robots.txt

Nuxt how to add robots.txt file

Applet login / authorization / personal information / sensitive information / cache / Audio

Find all the tables in the database if they contain particularly sensitive information

Hive will identify other sensitive information converted into non-reversible desensitization

*** f vulnerability to obtain sensitive information combined with Tencent cloud defense and *** f

pikaqiu practice platform -URL redirection, directory traversal, disclosure of sensitive information

Pikachu-URL redirects, directory traversal, disclosure of sensitive information

Recommended

Ranking

Linux关机和重启详解（shutdown、halt、poweroff、reboot、init）

Netty work notes 0007---NIO's three core component relationships

Knife4j tutorial

2021.10.29，内容:什么时候用接口和抽象类

How to solve the problem that changing the memory frequency causes the computer to become unusable?

SpringMVC Tutorial - Controller

linux learning skills -Linux 25 transport Vega paid special privileges and facl extension

Financial quarterly report evaluation report data automatic generation 1.0

Agile Development Series - The Values of Agile Development

scrapy achieve browsercookie Middleware

Daily

More

2024-05-19(0)

2024-05-18(31)

2024-05-17(6)

2024-05-16(23)

2024-05-15(5)

2024-05-14(9)

2024-05-13(8)

2024-05-12(28)

2024-05-11(32)

2024-05-10(34)