web crawling by search engines

1. Is there any way to prevent search engines from crawling the website?

The first: robots.txt method
There is a robots.txt in the root directory of the site, if not, you can create a new upload.
User-agent: *
Disallow: /Disallow
all search engines to access all parts of the website
User-agent: *
Disallow: /css/
Disallow: /admin/Disallow
all search engines to access the css and admin directories, modify the CSS or admin directories to yours The specified file directory or file can be used. The second method: Add <meta name="robots" content="noarchive"> code between <head> and </head> of the
webpage code method . This tag prohibits search engines from crawling the website and displaying webpage snapshots. Note: The prohibition code has been added, but the search engine can still search it out, because the update of the search engine index database takes time. Although Baiduspider has stopped accessing the pages on your website, it may take several months for the index information of the pages already established in the Baidu search engine database to be cleared.



 

2. Can search engines crawl JS?

1. The content of JS does not crawl, but google will capture JS analysis, but some search engine technologies have been able to get the link on the javescipt script, and even execute the script and follow the link. In fact, the javascript factor or flash website, the practice gives Search engine indexing and indexing troubles. Therefore, if you expect not to be indexed by search engines , the most direct method is to write robots files.
2. The navigation capability of some hyperlinks is completely simulated by Javascript. For example, add a piece of onclick event processing code to the HTML A element. When the hyperlink is clicked, there is Javascript code for page navigation;
3. Multi-level menus displayed on some pages It is implemented by Javascript, and the display and disappearance of the menu are controlled by Javascript. If the operation triggered by these menus is to navigate to another page, then the navigation information is difficult to be crawled by crawlers;
4. Absolutely avoid using JavaScript for navigation and other links . Navigation and links are the basis for search engines to crawl webpages. If search engines cannot crawl webpages, it means that webpages will not appear in the index results, and there is no way to talk about rankings. Try to avoid using JavaScript for content. Especially the content related to keywords should be avoided using JavaScript as much as possible, otherwise the keyword density will undoubtedly be reduced .
5. For the part that really needs to use JavaScript, put this part of the JavaScript script in one or several .js files, so as to avoid interfering with the crawling and analysis of the search engine.
Part of the JavaScript script that cannot be placed in the .js file, Put them at the bottom of the html code , before </body>, so that the search engine will find it last when analyzing the web page, reducing the interference to the search engine
6. Because it is difficult for ordinary search engines to process Javascript code, this feature can be correctly used to block some content on the page that does not need to be indexed by search engines, so that the keyword density of the page can be increased, and this type of information can be called "spam". , for example, advertisements, copyright notices, numerous outgoing links , information not related to the content, etc. All these spam information can be thrown into one or several .js files, thereby reducing the interference with the actual content of the page, improving the keyword density , and showing the core of the page content to search engines.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325679307&siteId=291194637